Overview

Dataset statistics

Number of variables25
Number of observations45376
Missing cells156473
Missing cells (%)13.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory8.7 MiB
Average record size in memory200.0 B

Variable types

Numeric10
Categorical15

Alerts

original_language has a high cardinality: 89 distinct valuesHigh cardinality
overview has a high cardinality: 44232 distinct valuesHigh cardinality
release_date has a high cardinality: 17333 distinct valuesHigh cardinality
tagline has a high cardinality: 20269 distinct valuesHigh cardinality
title has a high cardinality: 42196 distinct valuesHigh cardinality
name_collection has a high cardinality: 1695 distinct valuesHigh cardinality
id_genres has a high cardinality: 4064 distinct valuesHigh cardinality
name_genres has a high cardinality: 4064 distinct valuesHigh cardinality
id_production has a high cardinality: 22702 distinct valuesHigh cardinality
name_production has a high cardinality: 22667 distinct valuesHigh cardinality
id_countrie has a high cardinality: 2388 distinct valuesHigh cardinality
name_countrie has a high cardinality: 2388 distinct valuesHigh cardinality
id_language has a high cardinality: 1930 distinct valuesHigh cardinality
name_language has a high cardinality: 1841 distinct valuesHigh cardinality
Unnamed: 0 is highly overall correlated with idHigh correlation
budget is highly overall correlated with revenue and 1 other fieldsHigh correlation
id is highly overall correlated with Unnamed: 0High correlation
revenue is highly overall correlated with budget and 1 other fieldsHigh correlation
return is highly overall correlated with budget and 1 other fieldsHigh correlation
original_language is highly imbalanced (67.4%)Imbalance
status is highly imbalanced (97.0%)Imbalance
id_countrie is highly imbalanced (57.7%)Imbalance
name_countrie is highly imbalanced (57.7%)Imbalance
id_language is highly imbalanced (61.9%)Imbalance
name_language is highly imbalanced (62.0%)Imbalance
overview has 941 (2.1%) missing valuesMissing
tagline has 24978 (55.0%) missing valuesMissing
id_collection has 40888 (90.1%) missing valuesMissing
name_collection has 40888 (90.1%) missing valuesMissing
id_genres has 2384 (5.3%) missing valuesMissing
name_genres has 2384 (5.3%) missing valuesMissing
id_production has 11796 (26.0%) missing valuesMissing
name_production has 11796 (26.0%) missing valuesMissing
id_countrie has 6211 (13.7%) missing valuesMissing
name_countrie has 6211 (13.7%) missing valuesMissing
id_language has 3768 (8.3%) missing valuesMissing
name_language has 3891 (8.6%) missing valuesMissing
popularity is highly skewed (γ1 = 29.21506573)Skewed
return is highly skewed (γ1 = 138.3295261)Skewed
Unnamed: 0 is uniformly distributedUniform
overview is uniformly distributedUniform
tagline is uniformly distributedUniform
title is uniformly distributedUniform
Unnamed: 0 has unique valuesUnique
budget has 36490 (80.4%) zerosZeros
revenue has 37969 (83.7%) zerosZeros
runtime has 1535 (3.4%) zerosZeros
vote_average has 2947 (6.5%) zerosZeros
return has 39995 (88.1%) zerosZeros

Reproduction

Analysis started2023-05-13 19:20:17.532822
Analysis finished2023-05-13 19:21:05.677397
Duration48.14 seconds
Software versionpandas-profiling v0.0.dev0
Download configurationconfig.json

Variables

Unnamed: 0
Real number (ℝ)

HIGH CORRELATION  UNIFORM  UNIQUE 

Distinct45376
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean22687.5
Minimum0
Maximum45375
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size354.6 KiB
2023-05-13T16:21:06.250117image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2268.75
Q111343.75
median22687.5
Q334031.25
95-th percentile43106.25
Maximum45375
Range45375
Interquartile range (IQR)22687.5

Descriptive statistics

Standard deviation13099.067
Coefficient of variation (CV)0.57736936
Kurtosis-1.2
Mean22687.5
Median Absolute Deviation (MAD)11344
Skewness0
Sum1.029468 × 109
Variance1.7158556 × 108
MonotonicityStrictly increasing
2023-05-13T16:21:06.462629image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1
 
< 0.1%
30222 1
 
< 0.1%
30246 1
 
< 0.1%
30247 1
 
< 0.1%
30248 1
 
< 0.1%
30249 1
 
< 0.1%
30250 1
 
< 0.1%
30251 1
 
< 0.1%
30252 1
 
< 0.1%
30253 1
 
< 0.1%
Other values (45366) 45366
> 99.9%
ValueCountFrequency (%)
0 1
< 0.1%
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
ValueCountFrequency (%)
45375 1
< 0.1%
45374 1
< 0.1%
45373 1
< 0.1%
45372 1
< 0.1%
45371 1
< 0.1%
45370 1
< 0.1%
45369 1
< 0.1%
45368 1
< 0.1%
45367 1
< 0.1%
45366 1
< 0.1%

budget
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct1223
Distinct (%)2.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4232604.4
Minimum0
Maximum3.8 × 108
Zeros36490
Zeros (%)80.4%
Negative0
Negative (%)0.0%
Memory size354.6 KiB
2023-05-13T16:21:06.766917image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile25000000
Maximum3.8 × 108
Range3.8 × 108
Interquartile range (IQR)0

Descriptive statistics

Standard deviation17439860
Coefficient of variation (CV)4.1203614
Kurtosis66.634491
Mean4232604.4
Median Absolute Deviation (MAD)0
Skewness7.1183385
Sum1.9205866 × 1011
Variance3.041487 × 1014
MonotonicityNot monotonic
2023-05-13T16:21:07.099406image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 36490
80.4%
5000000 286
 
0.6%
10000000 259
 
0.6%
20000000 243
 
0.5%
2000000 242
 
0.5%
15000000 226
 
0.5%
3000000 223
 
0.5%
25000000 206
 
0.5%
1000000 197
 
0.4%
30000000 190
 
0.4%
Other values (1213) 6814
 
15.0%
ValueCountFrequency (%)
0 36490
80.4%
1 25
 
0.1%
2 14
 
< 0.1%
3 9
 
< 0.1%
4 8
 
< 0.1%
5 8
 
< 0.1%
6 5
 
< 0.1%
7 4
 
< 0.1%
8 5
 
< 0.1%
9 1
 
< 0.1%
ValueCountFrequency (%)
380000000 1
 
< 0.1%
300000000 1
 
< 0.1%
280000000 1
 
< 0.1%
270000000 1
 
< 0.1%
260000000 3
 
< 0.1%
258000000 1
 
< 0.1%
255000000 1
 
< 0.1%
250000000 10
< 0.1%
245000000 2
 
< 0.1%
237000000 1
 
< 0.1%

id
Real number (ℝ)

Distinct45346
Distinct (%)99.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean108027.1
Minimum2
Maximum469172
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size354.6 KiB
2023-05-13T16:21:07.521288image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile5348.75
Q126385.75
median59857.5
Q3156533.5
95-th percentile357194.5
Maximum469172
Range469170
Interquartile range (IQR)130147.75

Descriptive statistics

Standard deviation112168.38
Coefficient of variation (CV)1.0383355
Kurtosis0.55951556
Mean108027.1
Median Absolute Deviation (MAD)44418.5
Skewness1.2830689
Sum4.9018378 × 109
Variance1.2581745 × 1010
MonotonicityNot monotonic
2023-05-13T16:21:07.755677image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
141971 3
 
< 0.1%
97995 2
 
< 0.1%
10991 2
 
< 0.1%
109962 2
 
< 0.1%
119916 2
 
< 0.1%
159849 2
 
< 0.1%
84198 2
 
< 0.1%
132641 2
 
< 0.1%
168538 2
 
< 0.1%
99080 2
 
< 0.1%
Other values (45336) 45355
> 99.9%
ValueCountFrequency (%)
2 1
< 0.1%
3 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
11 1
< 0.1%
12 1
< 0.1%
13 1
< 0.1%
14 1
< 0.1%
15 1
< 0.1%
16 1
< 0.1%
ValueCountFrequency (%)
469172 1
< 0.1%
468707 1
< 0.1%
468343 1
< 0.1%
467731 1
< 0.1%
465044 1
< 0.1%
464819 1
< 0.1%
464207 1
< 0.1%
464111 1
< 0.1%
463906 1
< 0.1%
463800 1
< 0.1%

original_language
Categorical

HIGH CARDINALITY  IMBALANCE 

Distinct89
Distinct (%)0.2%
Missing11
Missing (%)< 0.1%
Memory size354.6 KiB
en
32202 
fr
 
2437
it
 
1528
ja
 
1349
de
 
1078
Other values (84)
6771 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters90730
Distinct characters26
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique17 ?
Unique (%)< 0.1%

Sample

1st rowen
2nd rowen
3rd rowen
4th rowen
5th rowen

Common Values

ValueCountFrequency (%)
en 32202
71.0%
fr 2437
 
5.4%
it 1528
 
3.4%
ja 1349
 
3.0%
de 1078
 
2.4%
es 992
 
2.2%
ru 822
 
1.8%
hi 508
 
1.1%
ko 444
 
1.0%
zh 408
 
0.9%
Other values (79) 3597
 
7.9%

Length

2023-05-13T16:21:07.974420image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
en 32202
71.0%
fr 2437
 
5.4%
it 1528
 
3.4%
ja 1349
 
3.0%
de 1078
 
2.4%
es 992
 
2.2%
ru 822
 
1.8%
hi 508
 
1.1%
ko 444
 
1.0%
zh 408
 
0.9%
Other values (79) 3597
 
7.9%

Most occurring characters

ValueCountFrequency (%)
e 34527
38.1%
n 32910
36.3%
r 3630
 
4.0%
f 2835
 
3.1%
i 2388
 
2.6%
t 2250
 
2.5%
a 1839
 
2.0%
s 1652
 
1.8%
j 1350
 
1.5%
d 1323
 
1.5%
Other values (16) 6026
 
6.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 90730
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 34527
38.1%
n 32910
36.3%
r 3630
 
4.0%
f 2835
 
3.1%
i 2388
 
2.6%
t 2250
 
2.5%
a 1839
 
2.0%
s 1652
 
1.8%
j 1350
 
1.5%
d 1323
 
1.5%
Other values (16) 6026
 
6.6%

Most occurring scripts

ValueCountFrequency (%)
Latin 90730
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 34527
38.1%
n 32910
36.3%
r 3630
 
4.0%
f 2835
 
3.1%
i 2388
 
2.6%
t 2250
 
2.5%
a 1839
 
2.0%
s 1652
 
1.8%
j 1350
 
1.5%
d 1323
 
1.5%
Other values (16) 6026
 
6.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90730
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 34527
38.1%
n 32910
36.3%
r 3630
 
4.0%
f 2835
 
3.1%
i 2388
 
2.6%
t 2250
 
2.5%
a 1839
 
2.0%
s 1652
 
1.8%
j 1350
 
1.5%
d 1323
 
1.5%
Other values (16) 6026
 
6.6%

overview
Categorical

HIGH CARDINALITY  MISSING  UNIFORM 

Distinct44232
Distinct (%)99.5%
Missing941
Missing (%)2.1%
Memory size354.6 KiB
No overview found.
 
133
No Overview
 
7
 
5
Recovering from a nail gun shot to the head and 13 months of coma, doctor Pekka Valinta starts to unravel the mystery of his past, still suffering from total amnesia.
 
3
No movie overview available.
 
3
Other values (44227)
44284 

Length

Max length1000
Median length786
Mean length323.29706
Min length1

Characters and Unicode

Total characters14365705
Distinct characters429
Distinct categories25 ?
Distinct scripts13 ?
Distinct blocks21 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique44173 ?
Unique (%)99.4%

Sample

1st rowLed by Woody, Andy's toys live happily in his room until Andy's birthday brings Buzz Lightyear onto the scene. Afraid of losing his place in Andy's heart, Woody plots against Buzz. But when circumstances separate Buzz and Woody from their owner, the duo eventually learns to put aside their differences.
2nd rowWhen siblings Judy and Peter discover an enchanted board game that opens the door to a magical world, they unwittingly invite Alan -- an adult who's been trapped inside the game for 26 years -- into their living room. Alan's only hope for freedom is to finish the game, which proves risky as all three find themselves running from giant rhinoceroses, evil monkeys and other terrifying creatures.
3rd rowA family wedding reignites the ancient feud between next-door neighbors and fishing buddies John and Max. Meanwhile, a sultry Italian divorcée opens a restaurant at the local bait shop, alarming the locals who worry she'll scare the fish away. But she's less interested in seafood than she is in cooking up a hot time with Max.
4th rowCheated on, mistreated and stepped on, the women are holding their breath, waiting for the elusive "good man" to break a string of less-than-stellar lovers. Friends and confidants Vannah, Bernie, Glo and Robin talk it all out, determined to find a better way to breathe.
5th rowJust when George Banks has recovered from his daughter's wedding, he receives the news that she's pregnant ... and that George's wife, Nina, is expecting too. He was planning on selling their home, but that's a plan that -- like George -- will have to change with the arrival of both a grandchild and a kid of his own.

Common Values

ValueCountFrequency (%)
No overview found. 133
 
0.3%
No Overview 7
 
< 0.1%
5
 
< 0.1%
Recovering from a nail gun shot to the head and 13 months of coma, doctor Pekka Valinta starts to unravel the mystery of his past, still suffering from total amnesia. 3
 
< 0.1%
No movie overview available. 3
 
< 0.1%
A few funny little novels about different aspects of life. 3
 
< 0.1%
Adaptation of the Jane Austen novel. 3
 
< 0.1%
King Lear, old and tired, divides his kingdom among his daughters, giving great importance to their protestations of love for him. When Cordelia, youngest and most honest, refuses to idly flatter the old man in return for favor, he banishes her and turns for support to his remaining daughters. But Goneril and Regan have no love for him and instead plot to take all his power from him. In a parallel, Lear's loyal courtier Gloucester favors his illegitimate son Edmund after being told lies about his faithful son Edgar. Madness and tragedy befall both ill-starred fathers. 3
 
< 0.1%
Poor but happy, young Nello and his grandfather live alone, delivering milk as a livelihood, in the outskirts of Antwerp, a city in Flanders (the Flemish or Dutch-speaking part of modern-day Belgium). They discover a beaten dog (a Bouvier, a large sturdy dog native to Flanders) and adopt it and nurse it back to health, naming it Patrasche, the middle name of Nello's mother Mary, who died when Nello was very young. Nello's mother was a talented artist, and like his mother, he delights in drawing, and his friend Aloise is his model and greatest fan and supporter. 2
 
< 0.1%
In Zola's Paris, an ingenue arrives at a tony bordello: she's Nana, guileless, but quickly learning to use her erotic innocence to get what she wants. She's an actress for a soft-core filmmaker and soon is the most popular courtesan in Paris, parlaying this into a house, bought for her by a wealthy banker. She tosses him and takes up with her neighbor, a count of impeccable rectitude, and with the count's impressionable son. The count is soon fetching sticks like a dog and mortgaging his lands to satisfy her whims. 2
 
< 0.1%
Other values (44222) 44271
97.6%
(Missing) 941
 
2.1%

Length

2023-05-13T16:21:08.178741image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the 138082
 
5.6%
a 98889
 
4.0%
and 75259
 
3.1%
to 73321
 
3.0%
of 69574
 
2.8%
in 48143
 
2.0%
is 36500
 
1.5%
his 36165
 
1.5%
with 23902
 
1.0%
her 21484
 
0.9%
Other values (97091) 1827389
74.6%

Most occurring characters

ValueCountFrequency (%)
2406350
16.8%
e 1363787
 
9.5%
a 940502
 
6.5%
t 934766
 
6.5%
i 851514
 
5.9%
o 829873
 
5.8%
n 822601
 
5.7%
s 767851
 
5.3%
r 744274
 
5.2%
h 600810
 
4.2%
Other values (419) 4103377
28.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 11150061
77.6%
Space Separator 2406388
 
16.8%
Uppercase Letter 390962
 
2.7%
Other Punctuation 312824
 
2.2%
Decimal Number 42223
 
0.3%
Dash Punctuation 36767
 
0.3%
Close Punctuation 10100
 
0.1%
Open Punctuation 10077
 
0.1%
Final Punctuation 4556
 
< 0.1%
Initial Punctuation 882
 
< 0.1%
Other values (15) 865
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 1363787
12.2%
a 940502
 
8.4%
t 934766
 
8.4%
i 851514
 
7.6%
o 829873
 
7.4%
n 822601
 
7.4%
s 767851
 
6.9%
r 744274
 
6.7%
h 600810
 
5.4%
l 478813
 
4.3%
Other values (142) 2815270
25.2%
Uppercase Letter
ValueCountFrequency (%)
A 42751
 
10.9%
T 35968
 
9.2%
S 31126
 
8.0%
M 23954
 
6.1%
B 23699
 
6.1%
C 22803
 
5.8%
H 19429
 
5.0%
W 18652
 
4.8%
I 16798
 
4.3%
D 16311
 
4.2%
Other values (77) 139471
35.7%
Other Letter
ValueCountFrequency (%)
6
 
4.8%
6
 
4.8%
5
 
4.0%
4
 
3.2%
3
 
2.4%
3
 
2.4%
3
 
2.4%
3
 
2.4%
2
 
1.6%
م 2
 
1.6%
Other values (76) 88
70.4%
Other Punctuation
ValueCountFrequency (%)
, 133443
42.7%
. 124794
39.9%
' 31121
 
9.9%
" 11661
 
3.7%
: 3299
 
1.1%
? 2759
 
0.9%
; 2493
 
0.8%
! 1543
 
0.5%
/ 765
 
0.2%
& 453
 
0.1%
Other values (12) 493
 
0.2%
Nonspacing Mark
ValueCountFrequency (%)
́ 4
12.1%
ి 4
12.1%
3
9.1%
3
9.1%
3
9.1%
̈ 3
9.1%
2
 
6.1%
2
 
6.1%
2
 
6.1%
2
 
6.1%
Other values (4) 5
15.2%
Decimal Number
ValueCountFrequency (%)
1 9748
23.1%
0 8265
19.6%
9 6405
15.2%
2 4251
10.1%
5 2440
 
5.8%
8 2379
 
5.6%
3 2342
 
5.5%
4 2176
 
5.2%
7 2131
 
5.0%
6 2086
 
4.9%
Spacing Mark
ValueCountFrequency (%)
11
40.7%
4
 
14.8%
3
 
11.1%
3
 
11.1%
ि 2
 
7.4%
2
 
7.4%
1
 
3.7%
ி 1
 
3.7%
Dash Punctuation
ValueCountFrequency (%)
- 35244
95.9%
881
 
2.4%
633
 
1.7%
5
 
< 0.1%
4
 
< 0.1%
Other Symbol
ValueCountFrequency (%)
® 45
70.3%
14
 
21.9%
¦ 2
 
3.1%
° 2
 
3.1%
1
 
1.6%
Math Symbol
ValueCountFrequency (%)
~ 20
50.0%
+ 11
27.5%
= 6
 
15.0%
| 2
 
5.0%
1
 
2.5%
Open Punctuation
ValueCountFrequency (%)
( 10024
99.5%
[ 50
 
0.5%
{ 2
 
< 0.1%
1
 
< 0.1%
Currency Symbol
ValueCountFrequency (%)
$ 317
96.4%
£ 10
 
3.0%
1
 
0.3%
1
 
0.3%
Space Separator
ValueCountFrequency (%)
2406350
> 99.9%
  36
 
< 0.1%
  2
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
) 10048
99.5%
] 50
 
0.5%
} 2
 
< 0.1%
Final Punctuation
ValueCountFrequency (%)
3847
84.4%
690
 
15.1%
» 19
 
0.4%
Initial Punctuation
ValueCountFrequency (%)
672
76.2%
192
 
21.8%
« 18
 
2.0%
Control
ValueCountFrequency (%)
106
96.4%
’ 3
 
2.7%
 1
 
0.9%
Modifier Symbol
ValueCountFrequency (%)
´ 25
65.8%
` 12
31.6%
¯ 1
 
2.6%
Format
ValueCountFrequency (%)
31
60.8%
­ 20
39.2%
Other Number
ValueCountFrequency (%)
½ 8
50.0%
¹ 8
50.0%
Connector Punctuation
ValueCountFrequency (%)
_ 19
100.0%
Line Separator
ValueCountFrequency (%)
7
100.0%
Letter Number
ValueCountFrequency (%)
2
100.0%
Paragraph Separator
ValueCountFrequency (%)
2
100.0%
Modifier Letter
ValueCountFrequency (%)
ʼ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 11535791
80.3%
Common 2824495
 
19.7%
Cyrillic 4587
 
< 0.1%
Greek 648
 
< 0.1%
Devanagari 77
 
< 0.1%
Telugu 30
 
< 0.1%
Hiragana 20
 
< 0.1%
Tamil 19
 
< 0.1%
Han 10
 
< 0.1%
Hangul 9
 
< 0.1%
Other values (3) 19
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 1363787
11.8%
a 940502
 
8.2%
t 934766
 
8.1%
i 851514
 
7.4%
o 829873
 
7.2%
n 822601
 
7.1%
s 767851
 
6.7%
r 744274
 
6.5%
h 600810
 
5.2%
l 478813
 
4.2%
Other values (132) 3201000
27.7%
Common
ValueCountFrequency (%)
2406350
85.2%
, 133443
 
4.7%
. 124794
 
4.4%
- 35244
 
1.2%
' 31121
 
1.1%
" 11661
 
0.4%
) 10048
 
0.4%
( 10024
 
0.4%
1 9748
 
0.3%
0 8265
 
0.3%
Other values (71) 43797
 
1.6%
Cyrillic
ValueCountFrequency (%)
о 470
 
10.2%
е 404
 
8.8%
а 373
 
8.1%
н 323
 
7.0%
и 299
 
6.5%
т 265
 
5.8%
р 240
 
5.2%
с 218
 
4.8%
в 173
 
3.8%
л 161
 
3.5%
Other values (46) 1661
36.2%
Greek
ValueCountFrequency (%)
α 60
 
9.3%
ο 55
 
8.5%
τ 43
 
6.6%
ι 36
 
5.6%
η 36
 
5.6%
ν 34
 
5.2%
ε 31
 
4.8%
ρ 31
 
4.8%
π 30
 
4.6%
ς 30
 
4.6%
Other values (33) 262
40.4%
Devanagari
ValueCountFrequency (%)
11
 
14.3%
6
 
7.8%
6
 
7.8%
5
 
6.5%
4
 
5.2%
3
 
3.9%
3
 
3.9%
3
 
3.9%
3
 
3.9%
3
 
3.9%
Other values (21) 30
39.0%
Hiragana
ValueCountFrequency (%)
4
20.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
Other values (7) 7
35.0%
Telugu
ValueCountFrequency (%)
ి 4
13.3%
3
10.0%
3
10.0%
3
10.0%
2
 
6.7%
2
 
6.7%
2
 
6.7%
2
 
6.7%
2
 
6.7%
1
 
3.3%
Other values (6) 6
20.0%
Tamil
ValueCountFrequency (%)
3
15.8%
2
10.5%
2
10.5%
2
10.5%
2
10.5%
1
 
5.3%
1
 
5.3%
1
 
5.3%
1
 
5.3%
1
 
5.3%
Other values (3) 3
15.8%
Han
ValueCountFrequency (%)
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
Hangul
ValueCountFrequency (%)
2
22.2%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
Thai
ValueCountFrequency (%)
2
25.0%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
Arabic
ValueCountFrequency (%)
م 2
50.0%
ہ 1
25.0%
ت 1
25.0%
Inherited
ValueCountFrequency (%)
́ 4
57.1%
̈ 3
42.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 14347707
99.9%
Punctuation 7270
 
0.1%
None 5930
 
< 0.1%
Cyrillic 4587
 
< 0.1%
Devanagari 77
 
< 0.1%
Telugu 30
 
< 0.1%
Hiragana 20
 
< 0.1%
Tamil 19
 
< 0.1%
Letterlike Symbols 14
 
< 0.1%
CJK 10
 
< 0.1%
Other values (11) 41
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2406350
16.8%
e 1363787
 
9.5%
a 940502
 
6.6%
t 934766
 
6.5%
i 851514
 
5.9%
o 829873
 
5.8%
n 822601
 
5.7%
s 767851
 
5.4%
r 744274
 
5.2%
h 600810
 
4.2%
Other values (82) 4085379
28.5%
Punctuation
ValueCountFrequency (%)
3847
52.9%
881
 
12.1%
690
 
9.5%
672
 
9.2%
633
 
8.7%
303
 
4.2%
192
 
2.6%
31
 
0.4%
7
 
0.1%
5
 
0.1%
Other values (4) 9
 
0.1%
None
ValueCountFrequency (%)
é 1552
26.2%
ä 294
 
5.0%
á 293
 
4.9%
ö 250
 
4.2%
í 243
 
4.1%
è 209
 
3.5%
ü 178
 
3.0%
ı 165
 
2.8%
ó 164
 
2.8%
ç 158
 
2.7%
Other values (141) 2424
40.9%
Cyrillic
ValueCountFrequency (%)
о 470
 
10.2%
е 404
 
8.8%
а 373
 
8.1%
н 323
 
7.0%
и 299
 
6.5%
т 265
 
5.8%
р 240
 
5.2%
с 218
 
4.8%
в 173
 
3.8%
л 161
 
3.5%
Other values (46) 1661
36.2%
Letterlike Symbols
ValueCountFrequency (%)
14
100.0%
Devanagari
ValueCountFrequency (%)
11
 
14.3%
6
 
7.8%
6
 
7.8%
5
 
6.5%
4
 
5.2%
3
 
3.9%
3
 
3.9%
3
 
3.9%
3
 
3.9%
3
 
3.9%
Other values (21) 30
39.0%
Alphabetic PF
ValueCountFrequency (%)
4
100.0%
Hiragana
ValueCountFrequency (%)
4
20.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
Other values (7) 7
35.0%
Diacriticals
ValueCountFrequency (%)
́ 4
57.1%
̈ 3
42.9%
Telugu
ValueCountFrequency (%)
ి 4
13.3%
3
10.0%
3
10.0%
3
10.0%
2
 
6.7%
2
 
6.7%
2
 
6.7%
2
 
6.7%
2
 
6.7%
1
 
3.3%
Other values (6) 6
20.0%
Tamil
ValueCountFrequency (%)
3
15.8%
2
10.5%
2
10.5%
2
10.5%
2
10.5%
1
 
5.3%
1
 
5.3%
1
 
5.3%
1
 
5.3%
1
 
5.3%
Other values (3) 3
15.8%
Arabic
ValueCountFrequency (%)
م 2
50.0%
ہ 1
25.0%
ت 1
25.0%
Hangul
ValueCountFrequency (%)
2
22.2%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
Number Forms
ValueCountFrequency (%)
2
100.0%
Modifier Letters
ValueCountFrequency (%)
ʼ 2
100.0%
Thai
ValueCountFrequency (%)
2
25.0%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
CJK
ValueCountFrequency (%)
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
Math Operators
ValueCountFrequency (%)
1
100.0%
Katakana
ValueCountFrequency (%)
1
100.0%
Currency Symbols
ValueCountFrequency (%)
1
50.0%
1
50.0%
Specials
ValueCountFrequency (%)
1
100.0%

popularity
Real number (ℝ)

Distinct43731
Distinct (%)96.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.9264576
Minimum0
Maximum547.4883
Zeros40
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size354.6 KiB
2023-05-13T16:21:08.446143image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.02079775
Q10.3888395
median1.1304545
Q33.6916945
95-th percentile11.063627
Maximum547.4883
Range547.4883
Interquartile range (IQR)3.302855

Descriptive statistics

Standard deviation6.0096718
Coefficient of variation (CV)2.0535653
Kurtosis1923.6882
Mean2.9264576
Median Absolute Deviation (MAD)0.9676215
Skewness29.215066
Sum132790.94
Variance36.116156
MonotonicityNot monotonic
2023-05-13T16:21:08.694916image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 × 10-656
 
0.1%
0.000308 42
 
0.1%
0 40
 
0.1%
0.00022 39
 
0.1%
0.000844 38
 
0.1%
0.000578 38
 
0.1%
0.001177 38
 
0.1%
0.002001 27
 
0.1%
0.003013 21
 
< 0.1%
0.00353 19
 
< 0.1%
Other values (43721) 45018
99.2%
ValueCountFrequency (%)
0 40
0.1%
1 × 10-656
0.1%
2 × 10-66
 
< 0.1%
3 × 10-66
 
< 0.1%
4 × 10-65
 
< 0.1%
5 × 10-61
 
< 0.1%
6 × 10-62
 
< 0.1%
7 × 10-61
 
< 0.1%
8 × 10-66
 
< 0.1%
9 × 10-62
 
< 0.1%
ValueCountFrequency (%)
547.488298 1
< 0.1%
294.337037 1
< 0.1%
287.253654 1
< 0.1%
228.032744 1
< 0.1%
213.849907 1
< 0.1%
187.860492 1
< 0.1%
185.330992 1
< 0.1%
185.070892 1
< 0.1%
183.870374 1
< 0.1%
154.801009 1
< 0.1%

release_date
Categorical

Distinct17333
Distinct (%)38.2%
Missing0
Missing (%)0.0%
Memory size354.6 KiB
2008-01-01
 
136
2009-01-01
 
121
2007-01-01
 
118
2005-01-01
 
111
2006-01-01
 
101
Other values (17328)
44789 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters453760
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8570 ?
Unique (%)18.9%

Sample

1st row1995-10-30
2nd row1995-12-15
3rd row1995-12-22
4th row1995-12-22
5th row1995-02-10

Common Values

ValueCountFrequency (%)
2008-01-01 136
 
0.3%
2009-01-01 121
 
0.3%
2007-01-01 118
 
0.3%
2005-01-01 111
 
0.2%
2006-01-01 101
 
0.2%
2002-01-01 96
 
0.2%
2004-01-01 90
 
0.2%
2001-01-01 84
 
0.2%
2003-01-01 76
 
0.2%
1997-01-01 69
 
0.2%
Other values (17323) 44374
97.8%

Length

2023-05-13T16:21:08.898031image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2008-01-01 136
 
0.3%
2009-01-01 121
 
0.3%
2007-01-01 118
 
0.3%
2005-01-01 111
 
0.2%
2006-01-01 101
 
0.2%
2002-01-01 96
 
0.2%
2004-01-01 90
 
0.2%
2001-01-01 84
 
0.2%
2003-01-01 76
 
0.2%
1997-01-01 69
 
0.2%
Other values (17323) 44374
97.8%

Most occurring characters

ValueCountFrequency (%)
0 97600
21.5%
- 90752
20.0%
1 84054
18.5%
2 52803
11.6%
9 39773
8.8%
3 15435
 
3.4%
8 15279
 
3.4%
6 15021
 
3.3%
5 14836
 
3.3%
7 14289
 
3.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 363008
80.0%
Dash Punctuation 90752
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 97600
26.9%
1 84054
23.2%
2 52803
14.5%
9 39773
11.0%
3 15435
 
4.3%
8 15279
 
4.2%
6 15021
 
4.1%
5 14836
 
4.1%
7 14289
 
3.9%
4 13918
 
3.8%
Dash Punctuation
ValueCountFrequency (%)
- 90752
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 453760
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 97600
21.5%
- 90752
20.0%
1 84054
18.5%
2 52803
11.6%
9 39773
8.8%
3 15435
 
3.4%
8 15279
 
3.4%
6 15021
 
3.3%
5 14836
 
3.3%
7 14289
 
3.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 453760
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 97600
21.5%
- 90752
20.0%
1 84054
18.5%
2 52803
11.6%
9 39773
8.8%
3 15435
 
3.4%
8 15279
 
3.4%
6 15021
 
3.3%
5 14836
 
3.3%
7 14289
 
3.1%

revenue
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct6863
Distinct (%)15.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11230099
Minimum0
Maximum2.7879651 × 109
Zeros37969
Zeros (%)83.7%
Negative0
Negative (%)0.0%
Memory size354.6 KiB
2023-05-13T16:21:09.070580image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile48020044
Maximum2.7879651 × 109
Range2.7879651 × 109
Interquartile range (IQR)0

Descriptive statistics

Standard deviation64389957
Coefficient of variation (CV)5.7336944
Kurtosis237.07741
Mean11230099
Median Absolute Deviation (MAD)0
Skewness12.254722
Sum5.0957698 × 1011
Variance4.1460665 × 1015
MonotonicityNot monotonic
2023-05-13T16:21:09.258086image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 37969
83.7%
12000000 20
 
< 0.1%
10000000 19
 
< 0.1%
11000000 19
 
< 0.1%
2000000 18
 
< 0.1%
6000000 17
 
< 0.1%
5000000 14
 
< 0.1%
8000000 13
 
< 0.1%
500000 13
 
< 0.1%
1 12
 
< 0.1%
Other values (6853) 7262
 
16.0%
ValueCountFrequency (%)
0 37969
83.7%
1 12
 
< 0.1%
2 3
 
< 0.1%
3 9
 
< 0.1%
4 4
 
< 0.1%
5 5
 
< 0.1%
6 2
 
< 0.1%
7 4
 
< 0.1%
8 5
 
< 0.1%
9 1
 
< 0.1%
ValueCountFrequency (%)
2787965087 1
< 0.1%
2068223624 1
< 0.1%
1845034188 1
< 0.1%
1519557910 1
< 0.1%
1513528810 1
< 0.1%
1506249360 1
< 0.1%
1405403694 1
< 0.1%
1342000000 1
< 0.1%
1274219009 1
< 0.1%
1262886337 1
< 0.1%

runtime
Real number (ℝ)

Distinct353
Distinct (%)0.8%
Missing246
Missing (%)0.5%
Infinite0
Infinite (%)0.0%
Mean94.181675
Minimum0
Maximum1256
Zeros1535
Zeros (%)3.4%
Negative0
Negative (%)0.0%
Memory size354.6 KiB
2023-05-13T16:21:09.513360image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile12
Q185
median95
Q3107
95-th percentile138
Maximum1256
Range1256
Interquartile range (IQR)22

Descriptive statistics

Standard deviation38.341059
Coefficient of variation (CV)0.4070968
Kurtosis93.925543
Mean94.181675
Median Absolute Deviation (MAD)11
Skewness4.4907363
Sum4250419
Variance1470.0368
MonotonicityNot monotonic
2023-05-13T16:21:09.760667image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
90 2549
 
5.6%
0 1535
 
3.4%
100 1470
 
3.2%
95 1410
 
3.1%
93 1214
 
2.7%
96 1104
 
2.4%
92 1079
 
2.4%
94 1062
 
2.3%
91 1055
 
2.3%
88 1030
 
2.3%
Other values (343) 31622
69.7%
ValueCountFrequency (%)
0 1535
3.4%
1 107
 
0.2%
2 33
 
0.1%
3 48
 
0.1%
4 50
 
0.1%
5 51
 
0.1%
6 72
 
0.2%
7 103
 
0.2%
8 78
 
0.2%
9 63
 
0.1%
ValueCountFrequency (%)
1256 1
< 0.1%
1140 2
< 0.1%
931 1
< 0.1%
925 1
< 0.1%
900 1
< 0.1%
877 1
< 0.1%
874 1
< 0.1%
840 2
< 0.1%
780 1
< 0.1%
720 1
< 0.1%

status
Categorical

Distinct6
Distinct (%)< 0.1%
Missing80
Missing (%)0.2%
Memory size354.6 KiB
Released
44936 
Rumored
 
230
Post Production
 
97
In Production
 
19
Planned
 
13

Length

Max length15
Median length8
Mean length8.0117229
Min length7

Characters and Unicode

Total characters362899
Distinct characters18
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowReleased
2nd rowReleased
3rd rowReleased
4th rowReleased
5th rowReleased

Common Values

ValueCountFrequency (%)
Released 44936
99.0%
Rumored 230
 
0.5%
Post Production 97
 
0.2%
In Production 19
 
< 0.1%
Planned 13
 
< 0.1%
Canceled 1
 
< 0.1%
(Missing) 80
 
0.2%

Length

2023-05-13T16:21:10.005604image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-13T16:21:10.248033image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
released 44936
99.0%
rumored 230
 
0.5%
production 116
 
0.3%
post 97
 
0.2%
in 19
 
< 0.1%
planned 13
 
< 0.1%
canceled 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
e 135053
37.2%
d 45296
 
12.5%
R 45166
 
12.4%
s 45033
 
12.4%
l 44950
 
12.4%
a 44950
 
12.4%
o 559
 
0.2%
r 346
 
0.1%
u 346
 
0.1%
m 230
 
0.1%
Other values (8) 970
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 317371
87.5%
Uppercase Letter 45412
 
12.5%
Space Separator 116
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 135053
42.6%
d 45296
 
14.3%
s 45033
 
14.2%
l 44950
 
14.2%
a 44950
 
14.2%
o 559
 
0.2%
r 346
 
0.1%
u 346
 
0.1%
m 230
 
0.1%
t 213
 
0.1%
Other values (3) 395
 
0.1%
Uppercase Letter
ValueCountFrequency (%)
R 45166
99.5%
P 226
 
0.5%
I 19
 
< 0.1%
C 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
116
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 362783
> 99.9%
Common 116
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 135053
37.2%
d 45296
 
12.5%
R 45166
 
12.4%
s 45033
 
12.4%
l 44950
 
12.4%
a 44950
 
12.4%
o 559
 
0.2%
r 346
 
0.1%
u 346
 
0.1%
m 230
 
0.1%
Other values (7) 854
 
0.2%
Common
ValueCountFrequency (%)
116
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 362899
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 135053
37.2%
d 45296
 
12.5%
R 45166
 
12.4%
s 45033
 
12.4%
l 44950
 
12.4%
a 44950
 
12.4%
o 559
 
0.2%
r 346
 
0.1%
u 346
 
0.1%
m 230
 
0.1%
Other values (8) 970
 
0.3%

tagline
Categorical

HIGH CARDINALITY  MISSING  UNIFORM 

Distinct20269
Distinct (%)99.4%
Missing24978
Missing (%)55.0%
Memory size354.6 KiB
Based on a true story.
 
7
Trust no one.
 
4
Be careful what you wish for.
 
4
-
 
4
How far would you go?
 
3
Other values (20264)
20376 

Length

Max length297
Median length204
Mean length46.999314
Min length1

Characters and Unicode

Total characters958692
Distinct characters170
Distinct categories17 ?
Distinct scripts6 ?
Distinct blocks10 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique20163 ?
Unique (%)98.8%

Sample

1st rowRoll the dice and unleash the excitement!
2nd rowStill Yelling. Still Fighting. Still Ready for Love.
3rd rowFriends are the people who let you be yourself... and never let you forget it.
4th rowJust When His World Is Back To Normal... He's In For The Surprise Of His Life!
5th rowA Los Angeles Crime Saga

Common Values

ValueCountFrequency (%)
Based on a true story. 7
 
< 0.1%
Trust no one. 4
 
< 0.1%
Be careful what you wish for. 4
 
< 0.1%
- 4
 
< 0.1%
How far would you go? 3
 
< 0.1%
Drama 3
 
< 0.1%
Classic Albums 3
 
< 0.1%
There are two sides to every love story. 3
 
< 0.1%
There is no turning back 3
 
< 0.1%
Documentary 3
 
< 0.1%
Other values (20259) 20361
44.9%
(Missing) 24978
55.0%

Length

2023-05-13T16:21:10.508751image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the 10998
 
6.3%
a 6815
 
3.9%
of 4404
 
2.5%
to 3584
 
2.1%
is 2796
 
1.6%
in 2693
 
1.5%
and 2682
 
1.5%
you 2389
 
1.4%
1582
 
0.9%
for 1523
 
0.9%
Other values (15100) 134470
77.3%

Most occurring characters

ValueCountFrequency (%)
153686
16.0%
e 94412
 
9.8%
t 57267
 
6.0%
o 56566
 
5.9%
a 51473
 
5.4%
n 47498
 
5.0%
i 46036
 
4.8%
r 44992
 
4.7%
s 42360
 
4.4%
h 37172
 
3.9%
Other values (160) 327230
34.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 680479
71.0%
Space Separator 153686
 
16.0%
Uppercase Letter 74991
 
7.8%
Other Punctuation 44585
 
4.7%
Decimal Number 2687
 
0.3%
Dash Punctuation 1944
 
0.2%
Final Punctuation 98
 
< 0.1%
Open Punctuation 56
 
< 0.1%
Close Punctuation 55
 
< 0.1%
Currency Symbol 37
 
< 0.1%
Other values (7) 74
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 94412
13.9%
t 57267
 
8.4%
o 56566
 
8.3%
a 51473
 
7.6%
n 47498
 
7.0%
i 46036
 
6.8%
r 44992
 
6.6%
s 42360
 
6.2%
h 37172
 
5.5%
l 30174
 
4.4%
Other values (43) 172529
25.4%
Other Letter
ValueCountFrequency (%)
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
Other values (24) 24
70.6%
Uppercase Letter
ValueCountFrequency (%)
T 10009
 
13.3%
A 6874
 
9.2%
S 5652
 
7.5%
H 4402
 
5.9%
I 4387
 
5.9%
E 4306
 
5.7%
W 3681
 
4.9%
O 3477
 
4.6%
N 3195
 
4.3%
L 3194
 
4.3%
Other values (20) 25814
34.4%
Other Punctuation
ValueCountFrequency (%)
. 26647
59.8%
! 5784
 
13.0%
' 5674
 
12.7%
, 4226
 
9.5%
? 1161
 
2.6%
" 582
 
1.3%
148
 
0.3%
: 138
 
0.3%
& 83
 
0.2%
* 42
 
0.1%
Other values (7) 100
 
0.2%
Decimal Number
ValueCountFrequency (%)
0 802
29.8%
1 516
19.2%
2 299
 
11.1%
3 208
 
7.7%
9 208
 
7.7%
5 168
 
6.3%
4 140
 
5.2%
6 121
 
4.5%
7 121
 
4.5%
8 104
 
3.9%
Math Symbol
ValueCountFrequency (%)
+ 5
35.7%
= 5
35.7%
| 2
 
14.3%
~ 1
 
7.1%
1
 
7.1%
Dash Punctuation
ValueCountFrequency (%)
- 1927
99.1%
9
 
0.5%
8
 
0.4%
Final Punctuation
ValueCountFrequency (%)
82
83.7%
15
 
15.3%
» 1
 
1.0%
Initial Punctuation
ValueCountFrequency (%)
14
73.7%
4
 
21.1%
« 1
 
5.3%
Open Punctuation
ValueCountFrequency (%)
( 49
87.5%
[ 7
 
12.5%
Close Punctuation
ValueCountFrequency (%)
) 48
87.3%
] 7
 
12.7%
Other Number
ValueCountFrequency (%)
½ 2
66.7%
² 1
33.3%
Modifier Letter
ValueCountFrequency (%)
ˌ 1
50.0%
ˈ 1
50.0%
Space Separator
ValueCountFrequency (%)
153686
100.0%
Currency Symbol
ValueCountFrequency (%)
$ 37
100.0%
Nonspacing Mark
ValueCountFrequency (%)
1
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 755470
78.8%
Common 203187
 
21.2%
Han 21
 
< 0.1%
Tamil 5
 
< 0.1%
Hiragana 5
 
< 0.1%
Katakana 4
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 94412
 
12.5%
t 57267
 
7.6%
o 56566
 
7.5%
a 51473
 
6.8%
n 47498
 
6.3%
i 46036
 
6.1%
r 44992
 
6.0%
s 42360
 
5.6%
h 37172
 
4.9%
l 30174
 
4.0%
Other values (73) 247520
32.8%
Common
ValueCountFrequency (%)
153686
75.6%
. 26647
 
13.1%
! 5784
 
2.8%
' 5674
 
2.8%
, 4226
 
2.1%
- 1927
 
0.9%
? 1161
 
0.6%
0 802
 
0.4%
" 582
 
0.3%
1 516
 
0.3%
Other values (42) 2182
 
1.1%
Han
ValueCountFrequency (%)
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
Other values (11) 11
52.4%
Tamil
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%
Hiragana
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%
Katakana
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 958262
> 99.9%
Punctuation 280
 
< 0.1%
None 110
 
< 0.1%
CJK 21
 
< 0.1%
Tamil 5
 
< 0.1%
Hiragana 5
 
< 0.1%
Katakana 4
 
< 0.1%
IPA Ext 2
 
< 0.1%
Modifier Letters 2
 
< 0.1%
Math Operators 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
153686
16.0%
e 94412
 
9.9%
t 57267
 
6.0%
o 56566
 
5.9%
a 51473
 
5.4%
n 47498
 
5.0%
i 46036
 
4.8%
r 44992
 
4.7%
s 42360
 
4.4%
h 37172
 
3.9%
Other values (78) 326800
34.1%
Punctuation
ValueCountFrequency (%)
148
52.9%
82
29.3%
15
 
5.4%
14
 
5.0%
9
 
3.2%
8
 
2.9%
4
 
1.4%
None
ValueCountFrequency (%)
é 18
16.4%
ä 16
14.5%
ö 8
 
7.3%
á 6
 
5.5%
ó 6
 
5.5%
ü 5
 
4.5%
í 5
 
4.5%
ı 5
 
4.5%
· 4
 
3.6%
ć 3
 
2.7%
Other values (26) 34
30.9%
IPA Ext
ValueCountFrequency (%)
ə 2
100.0%
Tamil
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%
CJK
ValueCountFrequency (%)
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
Other values (11) 11
52.4%
Katakana
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%
Modifier Letters
ValueCountFrequency (%)
ˌ 1
50.0%
ˈ 1
50.0%
Hiragana
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%
Math Operators
ValueCountFrequency (%)
1
100.0%

title
Categorical

HIGH CARDINALITY  UNIFORM 

Distinct42196
Distinct (%)93.0%
Missing0
Missing (%)0.0%
Memory size354.6 KiB
Cinderella
 
11
Alice in Wonderland
 
9
Hamlet
 
9
Les Misérables
 
8
Beauty and the Beast
 
8
Other values (42191)
45331 

Length

Max length105
Median length79
Mean length16.701781
Min length1

Characters and Unicode

Total characters757860
Distinct characters287
Distinct categories17 ?
Distinct scripts7 ?
Distinct blocks12 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique39869 ?
Unique (%)87.9%

Sample

1st rowToy Story
2nd rowJumanji
3rd rowGrumpier Old Men
4th rowWaiting to Exhale
5th rowFather of the Bride Part II

Common Values

ValueCountFrequency (%)
Cinderella 11
 
< 0.1%
Alice in Wonderland 9
 
< 0.1%
Hamlet 9
 
< 0.1%
Les Misérables 8
 
< 0.1%
Beauty and the Beast 8
 
< 0.1%
The Three Musketeers 7
 
< 0.1%
Blackout 7
 
< 0.1%
Treasure Island 7
 
< 0.1%
A Christmas Carol 7
 
< 0.1%
The Journey 6
 
< 0.1%
Other values (42186) 45297
99.8%

Length

2023-05-13T16:21:10.915567image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the 14555
 
10.7%
of 4930
 
3.6%
a 2241
 
1.6%
in 1693
 
1.2%
and 1631
 
1.2%
to 1054
 
0.8%
757
 
0.6%
man 665
 
0.5%
love 664
 
0.5%
for 601
 
0.4%
Other values (24353) 107390
78.9%

Most occurring characters

ValueCountFrequency (%)
90827
 
12.0%
e 76251
 
10.1%
a 48940
 
6.5%
o 45671
 
6.0%
n 40817
 
5.4%
r 40018
 
5.3%
i 39764
 
5.2%
t 36722
 
4.8%
s 29519
 
3.9%
h 28516
 
3.8%
Other values (277) 280815
37.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 534134
70.5%
Uppercase Letter 117265
 
15.5%
Space Separator 90827
 
12.0%
Other Punctuation 10489
 
1.4%
Decimal Number 3850
 
0.5%
Dash Punctuation 981
 
0.1%
Close Punctuation 87
 
< 0.1%
Open Punctuation 85
 
< 0.1%
Final Punctuation 38
 
< 0.1%
Other Letter 25
 
< 0.1%
Other values (7) 79
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 76251
14.3%
a 48940
9.2%
o 45671
 
8.6%
n 40817
 
7.6%
r 40018
 
7.5%
i 39764
 
7.4%
t 36722
 
6.9%
s 29519
 
5.5%
h 28516
 
5.3%
l 25924
 
4.9%
Other values (121) 121992
22.8%
Uppercase Letter
ValueCountFrequency (%)
T 16019
13.7%
S 10336
 
8.8%
M 8031
 
6.8%
B 7659
 
6.5%
C 7165
 
6.1%
A 6785
 
5.8%
D 6335
 
5.4%
L 5872
 
5.0%
H 5170
 
4.4%
W 5166
 
4.4%
Other values (65) 38727
33.0%
Other Letter
ValueCountFrequency (%)
چ 2
 
8.0%
ه 2
 
8.0%
ی 2
 
8.0%
ک 2
 
8.0%
1
 
4.0%
1
 
4.0%
1
 
4.0%
1
 
4.0%
1
 
4.0%
ª 1
 
4.0%
Other values (11) 11
44.0%
Other Punctuation
ValueCountFrequency (%)
: 3717
35.4%
' 2505
23.9%
. 1603
15.3%
, 1134
 
10.8%
! 647
 
6.2%
& 458
 
4.4%
? 269
 
2.6%
/ 79
 
0.8%
* 19
 
0.2%
# 13
 
0.1%
Other values (8) 45
 
0.4%
Decimal Number
ValueCountFrequency (%)
2 861
22.4%
1 697
18.1%
0 616
16.0%
3 482
12.5%
9 230
 
6.0%
4 229
 
5.9%
5 225
 
5.8%
7 193
 
5.0%
8 161
 
4.2%
6 156
 
4.1%
Math Symbol
ValueCountFrequency (%)
+ 17
70.8%
× 3
 
12.5%
1
 
4.2%
= 1
 
4.2%
1
 
4.2%
1
 
4.2%
Other Number
ValueCountFrequency (%)
½ 12
63.2%
² 3
 
15.8%
³ 2
 
10.5%
1
 
5.3%
1
 
5.3%
Other Symbol
ValueCountFrequency (%)
° 3
37.5%
2
25.0%
1
 
12.5%
1
 
12.5%
1
 
12.5%
Currency Symbol
ValueCountFrequency (%)
$ 18
85.7%
¢ 2
 
9.5%
£ 1
 
4.8%
Dash Punctuation
ValueCountFrequency (%)
- 966
98.5%
15
 
1.5%
Close Punctuation
ValueCountFrequency (%)
) 82
94.3%
] 5
 
5.7%
Open Punctuation
ValueCountFrequency (%)
( 80
94.1%
[ 5
 
5.9%
Final Punctuation
ValueCountFrequency (%)
37
97.4%
1
 
2.6%
Initial Punctuation
ValueCountFrequency (%)
1
50.0%
1
50.0%
Space Separator
ValueCountFrequency (%)
90827
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 3
100.0%
Format
ValueCountFrequency (%)
2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 650884
85.9%
Common 106436
 
14.0%
Cyrillic 346
 
< 0.1%
Greek 170
 
< 0.1%
Arabic 11
 
< 0.1%
Katakana 8
 
< 0.1%
Han 5
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 76251
 
11.7%
a 48940
 
7.5%
o 45671
 
7.0%
n 40817
 
6.3%
r 40018
 
6.1%
i 39764
 
6.1%
t 36722
 
5.6%
s 29519
 
4.5%
h 28516
 
4.4%
l 25924
 
4.0%
Other values (107) 238742
36.7%
Common
ValueCountFrequency (%)
90827
85.3%
: 3717
 
3.5%
' 2505
 
2.4%
. 1603
 
1.5%
, 1134
 
1.1%
- 966
 
0.9%
2 861
 
0.8%
1 697
 
0.7%
! 647
 
0.6%
0 616
 
0.6%
Other values (50) 2863
 
2.7%
Cyrillic
ValueCountFrequency (%)
е 32
 
9.2%
о 32
 
9.2%
а 29
 
8.4%
н 24
 
6.9%
и 23
 
6.6%
р 22
 
6.4%
к 17
 
4.9%
с 15
 
4.3%
л 14
 
4.0%
в 14
 
4.0%
Other values (38) 124
35.8%
Greek
ValueCountFrequency (%)
α 20
 
11.8%
ι 14
 
8.2%
ο 14
 
8.2%
τ 9
 
5.3%
ά 8
 
4.7%
λ 8
 
4.7%
ρ 8
 
4.7%
ν 7
 
4.1%
ε 6
 
3.5%
ς 6
 
3.5%
Other values (32) 70
41.2%
Katakana
ValueCountFrequency (%)
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
Arabic
ValueCountFrequency (%)
چ 2
18.2%
ه 2
18.2%
ی 2
18.2%
ک 2
18.2%
س 1
9.1%
ا 1
9.1%
ج 1
9.1%
Han
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 756295
99.8%
None 1124
 
0.1%
Cyrillic 346
 
< 0.1%
Punctuation 62
 
< 0.1%
Arabic 11
 
< 0.1%
Katakana 8
 
< 0.1%
CJK 5
 
< 0.1%
Misc Symbols 3
 
< 0.1%
Letterlike Symbols 2
 
< 0.1%
Math Operators 2
 
< 0.1%
Other values (2) 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
90827
 
12.0%
e 76251
 
10.1%
a 48940
 
6.5%
o 45671
 
6.0%
n 40817
 
5.4%
r 40018
 
5.3%
i 39764
 
5.3%
t 36722
 
4.9%
s 29519
 
3.9%
h 28516
 
3.8%
Other values (76) 279250
36.9%
None
ValueCountFrequency (%)
é 218
19.4%
ä 127
 
11.3%
ö 55
 
4.9%
è 53
 
4.7%
ô 44
 
3.9%
ü 39
 
3.5%
ó 37
 
3.3%
á 35
 
3.1%
ı 35
 
3.1%
í 33
 
2.9%
Other values (108) 448
39.9%
Punctuation
ValueCountFrequency (%)
37
59.7%
15
24.2%
5
 
8.1%
2
 
3.2%
1
 
1.6%
1
 
1.6%
1
 
1.6%
Cyrillic
ValueCountFrequency (%)
е 32
 
9.2%
о 32
 
9.2%
а 29
 
8.4%
н 24
 
6.9%
и 23
 
6.6%
р 22
 
6.4%
к 17
 
4.9%
с 15
 
4.3%
л 14
 
4.0%
в 14
 
4.0%
Other values (38) 124
35.8%
Arabic
ValueCountFrequency (%)
چ 2
18.2%
ه 2
18.2%
ی 2
18.2%
ک 2
18.2%
س 1
9.1%
ا 1
9.1%
ج 1
9.1%
Misc Symbols
ValueCountFrequency (%)
2
66.7%
1
33.3%
CJK
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%
Number Forms
ValueCountFrequency (%)
1
100.0%
Letterlike Symbols
ValueCountFrequency (%)
1
50.0%
1
50.0%
Math Operators
ValueCountFrequency (%)
1
50.0%
1
50.0%
Katakana
ValueCountFrequency (%)
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
Arrows
ValueCountFrequency (%)
1
100.0%

vote_average
Real number (ℝ)

Distinct92
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.62407
Minimum0
Maximum10
Zeros2947
Zeros (%)6.5%
Negative0
Negative (%)0.0%
Memory size354.6 KiB
2023-05-13T16:21:11.438781image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q15
median6
Q36.8
95-th percentile7.8
Maximum10
Range10
Interquartile range (IQR)1.8

Descriptive statistics

Standard deviation1.9154225
Coefficient of variation (CV)0.34057587
Kurtosis2.5420547
Mean5.62407
Median Absolute Deviation (MAD)0.9
Skewness-1.524472
Sum255197.8
Variance3.6688434
MonotonicityNot monotonic
2023-05-13T16:21:11.747658image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2947
 
6.5%
6 2462
 
5.4%
5 1998
 
4.4%
7 1883
 
4.1%
6.5 1722
 
3.8%
6.3 1603
 
3.5%
5.5 1381
 
3.0%
5.8 1369
 
3.0%
6.4 1350
 
3.0%
6.7 1342
 
3.0%
Other values (82) 27319
60.2%
ValueCountFrequency (%)
0 2947
6.5%
0.5 13
 
< 0.1%
0.7 1
 
< 0.1%
1 103
 
0.2%
1.1 1
 
< 0.1%
1.2 4
 
< 0.1%
1.3 13
 
< 0.1%
1.4 5
 
< 0.1%
1.5 30
 
0.1%
1.6 6
 
< 0.1%
ValueCountFrequency (%)
10 185
0.4%
9.8 1
 
< 0.1%
9.6 1
 
< 0.1%
9.5 18
 
< 0.1%
9.4 3
 
< 0.1%
9.3 18
 
< 0.1%
9.2 4
 
< 0.1%
9.1 2
 
< 0.1%
9 158
0.3%
8.9 7
 
< 0.1%

release_year
Real number (ℝ)

Distinct135
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1991.8812
Minimum1874
Maximum2020
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size354.6 KiB
2023-05-13T16:21:12.006104image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1874
5-th percentile1941
Q11978
median2001
Q32010
95-th percentile2015
Maximum2020
Range146
Interquartile range (IQR)32

Descriptive statistics

Standard deviation24.05536
Coefficient of variation (CV)0.012076704
Kurtosis0.84010576
Mean1991.8812
Median Absolute Deviation (MAD)12
Skewness-1.2248636
Sum90383601
Variance578.66033
MonotonicityNot monotonic
2023-05-13T16:21:12.292765image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2014 1974
 
4.4%
2015 1905
 
4.2%
2013 1889
 
4.2%
2012 1722
 
3.8%
2011 1667
 
3.7%
2016 1604
 
3.5%
2009 1586
 
3.5%
2010 1501
 
3.3%
2008 1473
 
3.2%
2007 1320
 
2.9%
Other values (125) 28735
63.3%
ValueCountFrequency (%)
1874 1
 
< 0.1%
1878 1
 
< 0.1%
1883 1
 
< 0.1%
1887 1
 
< 0.1%
1888 2
 
< 0.1%
1890 5
 
< 0.1%
1891 6
< 0.1%
1892 3
 
< 0.1%
1893 1
 
< 0.1%
1894 13
< 0.1%
ValueCountFrequency (%)
2020 1
 
< 0.1%
2018 5
 
< 0.1%
2017 532
 
1.2%
2016 1604
3.5%
2015 1905
4.2%
2014 1974
4.4%
2013 1889
4.2%
2012 1722
3.8%
2011 1667
3.7%
2010 1501
3.3%

return
Real number (ℝ)

HIGH CORRELATION  SKEWED  ZEROS 

Distinct5232
Distinct (%)11.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean660.04278
Minimum0
Maximum12396383
Zeros39995
Zeros (%)88.1%
Negative0
Negative (%)0.0%
Memory size354.6 KiB
2023-05-13T16:21:12.519768image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2.5355363
Maximum12396383
Range12396383
Interquartile range (IQR)0

Descriptive statistics

Standard deviation74693.294
Coefficient of variation (CV)113.16432
Kurtosis20672.957
Mean660.04278
Median Absolute Deviation (MAD)0
Skewness138.32953
Sum29950101
Variance5.5790882 × 109
MonotonicityNot monotonic
2023-05-13T16:21:12.783523image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 39995
88.1%
1 20
 
< 0.1%
2 12
 
< 0.1%
4 11
 
< 0.1%
5 8
 
< 0.1%
3 7
 
< 0.1%
2.5 7
 
< 0.1%
1.333333333 7
 
< 0.1%
1.5 6
 
< 0.1%
7 4
 
< 0.1%
Other values (5222) 5299
 
11.7%
ValueCountFrequency (%)
0 39995
88.1%
5.217391304 × 10-71
 
< 0.1%
7.5 × 10-71
 
< 0.1%
9.375 × 10-71
 
< 0.1%
1.499133126 × 10-61
 
< 0.1%
1.8 × 10-61
 
< 0.1%
1.916666667 × 10-61
 
< 0.1%
3.5 × 10-61
 
< 0.1%
4 × 10-61
 
< 0.1%
5.111111111 × 10-61
 
< 0.1%
ValueCountFrequency (%)
12396383 1
< 0.1%
8500000 1
< 0.1%
4197476.625 1
< 0.1%
2755584 1
< 0.1%
1018619.283 1
< 0.1%
1000000 1
< 0.1%
26881.72043 1
< 0.1%
12890.38667 1
< 0.1%
5330.33945 1
< 0.1%
4133.333333 1
< 0.1%

id_collection
Real number (ℝ)

Distinct1695
Distinct (%)37.8%
Missing40888
Missing (%)90.1%
Infinite0
Infinite (%)0.0%
Mean184073.41
Minimum10
Maximum480160
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size354.6 KiB
2023-05-13T16:21:13.057365image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum10
5-th percentile2704
Q186026.25
median141531.5
Q3294172
95-th percentile439530.55
Maximum480160
Range480150
Interquartile range (IQR)208145.75

Descriptive statistics

Standard deviation141630.53
Coefficient of variation (CV)0.76942417
Kurtosis-0.9276419
Mean184073.41
Median Absolute Deviation (MAD)103794.5
Skewness0.53309635
Sum8.2612146 × 108
Variance2.0059207 × 1010
MonotonicityNot monotonic
2023-05-13T16:21:13.282422image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
415931 29
 
0.1%
421566 27
 
0.1%
645 26
 
0.1%
96887 26
 
0.1%
37261 25
 
0.1%
34055 22
 
< 0.1%
413661 21
 
< 0.1%
374509 16
 
< 0.1%
425164 15
 
< 0.1%
148324 15
 
< 0.1%
Other values (1685) 4266
 
9.4%
(Missing) 40888
90.1%
ValueCountFrequency (%)
10 8
< 0.1%
84 4
< 0.1%
119 3
 
< 0.1%
131 3
 
< 0.1%
151 6
< 0.1%
230 3
 
< 0.1%
263 3
 
< 0.1%
264 3
 
< 0.1%
295 5
< 0.1%
304 3
 
< 0.1%
ValueCountFrequency (%)
480160 1
 
< 0.1%
480071 1
 
< 0.1%
479971 1
 
< 0.1%
479888 2
 
< 0.1%
479692 2
 
< 0.1%
479549 1
 
< 0.1%
479319 13
< 0.1%
478947 2
 
< 0.1%
478628 12
< 0.1%
478545 1
 
< 0.1%

name_collection
Categorical

HIGH CARDINALITY  MISSING 

Distinct1695
Distinct (%)37.8%
Missing40888
Missing (%)90.1%
Memory size354.6 KiB
The Bowery Boys
 
29
Totò Collection
 
27
James Bond Collection
 
26
Zatôichi: The Blind Swordsman
 
26
The Carry On Collection
 
25
Other values (1690)
4355 

Length

Max length54
Median length43
Mean length23.855838
Min length3

Characters and Unicode

Total characters107065
Distinct characters166
Distinct categories12 ?
Distinct scripts7 ?
Distinct blocks8 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique390 ?
Unique (%)8.7%

Sample

1st rowToy Story Collection
2nd rowGrumpy Old Men Collection
3rd rowFather of the Bride Collection
4th rowJames Bond Collection
5th rowBalto Collection

Common Values

ValueCountFrequency (%)
The Bowery Boys 29
 
0.1%
Totò Collection 27
 
0.1%
James Bond Collection 26
 
0.1%
Zatôichi: The Blind Swordsman 26
 
0.1%
The Carry On Collection 25
 
0.1%
Pokémon Collection 22
 
< 0.1%
Charlie Chan (Sidney Toler) Collection 21
 
< 0.1%
Godzilla (Showa) Collection 16
 
< 0.1%
Dragon Ball Z (Movie) Collection 15
 
< 0.1%
Uuno Turhapuro 15
 
< 0.1%
Other values (1685) 4266
 
9.4%
(Missing) 40888
90.1%

Length

2023-05-13T16:21:13.598248image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
collection 3743
25.3%
the 1146
 
7.8%
of 230
 
1.6%
series 147
 
1.0%
139
 
0.9%
trilogy 87
 
0.6%
and 84
 
0.6%
man 62
 
0.4%
a 62
 
0.4%
in 56
 
0.4%
Other values (2407) 9028
61.1%

Most occurring characters

ValueCountFrequency (%)
o 11114
 
10.4%
e 10450
 
9.8%
10297
 
9.6%
l 10200
 
9.5%
i 7559
 
7.1%
n 7403
 
6.9%
t 6488
 
6.1%
c 4845
 
4.5%
C 4474
 
4.2%
a 4459
 
4.2%
Other values (156) 29776
27.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 81103
75.8%
Uppercase Letter 13885
 
13.0%
Space Separator 10297
 
9.6%
Other Punctuation 576
 
0.5%
Open Punctuation 335
 
0.3%
Close Punctuation 335
 
0.3%
Decimal Number 321
 
0.3%
Dash Punctuation 162
 
0.2%
Other Letter 37
 
< 0.1%
Final Punctuation 9
 
< 0.1%
Other values (2) 5
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 11114
13.7%
e 10450
12.9%
l 10200
12.6%
i 7559
9.3%
n 7403
9.1%
t 6488
8.0%
c 4845
 
6.0%
a 4459
 
5.5%
r 3870
 
4.8%
s 2588
 
3.2%
Other values (69) 12127
15.0%
Uppercase Letter
ValueCountFrequency (%)
C 4474
32.2%
T 1527
 
11.0%
S 1063
 
7.7%
B 682
 
4.9%
M 630
 
4.5%
A 509
 
3.7%
D 505
 
3.6%
H 462
 
3.3%
P 432
 
3.1%
G 417
 
3.0%
Other values (33) 3184
22.9%
Other Letter
ValueCountFrequency (%)
3
 
8.1%
3
 
8.1%
3
 
8.1%
3
 
8.1%
3
 
8.1%
3
 
8.1%
3
 
8.1%
3
 
8.1%
3
 
8.1%
2
 
5.4%
Other values (4) 8
21.6%
Other Punctuation
ValueCountFrequency (%)
. 172
29.9%
' 107
18.6%
: 99
17.2%
, 79
13.7%
& 52
 
9.0%
! 35
 
6.1%
/ 21
 
3.6%
? 4
 
0.7%
* 4
 
0.7%
3
 
0.5%
Decimal Number
ValueCountFrequency (%)
1 80
24.9%
9 64
19.9%
3 54
16.8%
0 51
15.9%
2 21
 
6.5%
8 13
 
4.0%
5 12
 
3.7%
7 11
 
3.4%
6 10
 
3.1%
4 5
 
1.6%
Open Punctuation
ValueCountFrequency (%)
( 330
98.5%
[ 5
 
1.5%
Close Punctuation
ValueCountFrequency (%)
) 330
98.5%
] 5
 
1.5%
Dash Punctuation
ValueCountFrequency (%)
- 160
98.8%
2
 
1.2%
Space Separator
ValueCountFrequency (%)
10297
100.0%
Final Punctuation
ValueCountFrequency (%)
9
100.0%
Modifier Letter
ValueCountFrequency (%)
3
100.0%
Other Number
ValueCountFrequency (%)
½ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 94574
88.3%
Common 12040
 
11.2%
Cyrillic 414
 
0.4%
Hiragana 15
 
< 0.1%
Hangul 10
 
< 0.1%
Katakana 9
 
< 0.1%
Han 3
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 11114
11.8%
e 10450
11.0%
l 10200
10.8%
i 7559
 
8.0%
n 7403
 
7.8%
t 6488
 
6.9%
c 4845
 
5.1%
C 4474
 
4.7%
a 4459
 
4.7%
r 3870
 
4.1%
Other values (70) 23712
25.1%
Cyrillic
ValueCountFrequency (%)
л 48
 
11.6%
и 41
 
9.9%
о 37
 
8.9%
к 30
 
7.2%
е 27
 
6.5%
я 25
 
6.0%
а 17
 
4.1%
ц 16
 
3.9%
К 16
 
3.9%
р 14
 
3.4%
Other values (32) 143
34.5%
Common
ValueCountFrequency (%)
10297
85.5%
( 330
 
2.7%
) 330
 
2.7%
. 172
 
1.4%
- 160
 
1.3%
' 107
 
0.9%
: 99
 
0.8%
1 80
 
0.7%
, 79
 
0.7%
9 64
 
0.5%
Other values (20) 322
 
2.7%
Hiragana
ValueCountFrequency (%)
3
20.0%
3
20.0%
3
20.0%
3
20.0%
3
20.0%
Hangul
ValueCountFrequency (%)
2
20.0%
2
20.0%
2
20.0%
2
20.0%
2
20.0%
Katakana
ValueCountFrequency (%)
3
33.3%
3
33.3%
3
33.3%
Han
ValueCountFrequency (%)
3
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 106351
99.3%
Cyrillic 414
 
0.4%
None 246
 
0.2%
Hiragana 15
 
< 0.1%
Punctuation 14
 
< 0.1%
Katakana 12
 
< 0.1%
Hangul 10
 
< 0.1%
CJK 3
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 11114
 
10.5%
e 10450
 
9.8%
10297
 
9.7%
l 10200
 
9.6%
i 7559
 
7.1%
n 7403
 
7.0%
t 6488
 
6.1%
c 4845
 
4.6%
C 4474
 
4.2%
a 4459
 
4.2%
Other values (67) 29062
27.3%
Cyrillic
ValueCountFrequency (%)
л 48
 
11.6%
и 41
 
9.9%
о 37
 
8.9%
к 30
 
7.2%
е 27
 
6.5%
я 25
 
6.0%
а 17
 
4.1%
ц 16
 
3.9%
К 16
 
3.9%
р 14
 
3.4%
Other values (32) 143
34.5%
None
ValueCountFrequency (%)
é 45
18.3%
ä 40
16.3%
ô 35
14.2%
ò 28
11.4%
ö 19
7.7%
ó 14
 
5.7%
ı 14
 
5.7%
í 9
 
3.7%
á 4
 
1.6%
İ 4
 
1.6%
Other values (19) 34
13.8%
Punctuation
ValueCountFrequency (%)
9
64.3%
3
 
21.4%
2
 
14.3%
Hiragana
ValueCountFrequency (%)
3
20.0%
3
20.0%
3
20.0%
3
20.0%
3
20.0%
Katakana
ValueCountFrequency (%)
3
25.0%
3
25.0%
3
25.0%
3
25.0%
CJK
ValueCountFrequency (%)
3
100.0%
Hangul
ValueCountFrequency (%)
2
20.0%
2
20.0%
2
20.0%
2
20.0%
2
20.0%

id_genres
Categorical

HIGH CARDINALITY  MISSING 

Distinct4064
Distinct (%)9.5%
Missing2384
Missing (%)5.3%
Memory size354.6 KiB
18.0
4998 
35.0
3621 
99.0
 
2713
18.0,10749.0
 
1301
35.0,18.0
 
1135
Other values (4059)
29224 

Length

Max length49
Median length44
Mean length10.806313
Min length4

Characters and Unicode

Total characters464585
Distinct characters12
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2364 ?
Unique (%)5.5%

Sample

1st row16.0,35.0,10751.0
2nd row12.0,14.0,10751.0
3rd row10749.0,35.0
4th row35.0,18.0,10749.0
5th row35.0

Common Values

ValueCountFrequency (%)
18.0 4998
 
11.0%
35.0 3621
 
8.0%
99.0 2713
 
6.0%
18.0,10749.0 1301
 
2.9%
35.0,18.0 1135
 
2.5%
27.0 974
 
2.1%
35.0,10749.0 930
 
2.0%
35.0,18.0,10749.0 593
 
1.3%
18.0,35.0 532
 
1.2%
27.0,53.0 528
 
1.2%
Other values (4054) 25667
56.6%
(Missing) 2384
 
5.3%

Length

2023-05-13T16:21:14.097331image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
18.0 4998
 
11.6%
35.0 3621
 
8.4%
99.0 2713
 
6.3%
18.0,10749.0 1301
 
3.0%
35.0,18.0 1135
 
2.6%
27.0 974
 
2.3%
35.0,10749.0 930
 
2.2%
35.0,18.0,10749.0 593
 
1.4%
18.0,35.0 532
 
1.2%
27.0,53.0 528
 
1.2%
Other values (4054) 25667
59.7%

Most occurring characters

ValueCountFrequency (%)
0 112514
24.2%
. 91036
19.6%
, 48044
10.3%
1 45571
9.8%
8 39700
 
8.5%
5 24891
 
5.4%
3 23239
 
5.0%
7 22731
 
4.9%
9 18660
 
4.0%
2 17677
 
3.8%
Other values (2) 20522
 
4.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 325505
70.1%
Other Punctuation 139080
29.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 112514
34.6%
1 45571
14.0%
8 39700
 
12.2%
5 24891
 
7.6%
3 23239
 
7.1%
7 22731
 
7.0%
9 18660
 
5.7%
2 17677
 
5.4%
4 13108
 
4.0%
6 7414
 
2.3%
Other Punctuation
ValueCountFrequency (%)
. 91036
65.5%
, 48044
34.5%

Most occurring scripts

ValueCountFrequency (%)
Common 464585
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 112514
24.2%
. 91036
19.6%
, 48044
10.3%
1 45571
9.8%
8 39700
 
8.5%
5 24891
 
5.4%
3 23239
 
5.0%
7 22731
 
4.9%
9 18660
 
4.0%
2 17677
 
3.8%
Other values (2) 20522
 
4.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 464585
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 112514
24.2%
. 91036
19.6%
, 48044
10.3%
1 45571
9.8%
8 39700
 
8.5%
5 24891
 
5.4%
3 23239
 
5.0%
7 22731
 
4.9%
9 18660
 
4.0%
2 17677
 
3.8%
Other values (2) 20522
 
4.4%

name_genres
Categorical

HIGH CARDINALITY  MISSING 

Distinct4064
Distinct (%)9.5%
Missing2384
Missing (%)5.3%
Memory size354.6 KiB
Drama
4998 
Comedy
3621 
Documentary
 
2713
Drama,Romance
 
1301
Comedy,Drama
 
1135
Other values (4059)
29224 

Length

Max length73
Median length59
Mean length15.345506
Min length3

Characters and Unicode

Total characters659734
Distinct characters30
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2364 ?
Unique (%)5.5%

Sample

1st rowAnimation,Comedy,Family
2nd rowAdventure,Fantasy,Family
3rd rowRomance,Comedy
4th rowComedy,Drama,Romance
5th rowComedy

Common Values

ValueCountFrequency (%)
Drama 4998
 
11.0%
Comedy 3621
 
8.0%
Documentary 2713
 
6.0%
Drama,Romance 1301
 
2.9%
Comedy,Drama 1135
 
2.5%
Horror 974
 
2.1%
Comedy,Romance 930
 
2.0%
Comedy,Drama,Romance 593
 
1.3%
Drama,Comedy 532
 
1.2%
Horror,Thriller 528
 
1.2%
Other values (4054) 25667
56.6%
(Missing) 2384
 
5.3%

Length

2023-05-13T16:21:14.448681image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
drama 4998
 
10.7%
comedy 3621
 
7.7%
documentary 2713
 
5.8%
fiction 1758
 
3.8%
drama,romance 1301
 
2.8%
comedy,drama 1135
 
2.4%
horror 974
 
2.1%
comedy,romance 930
 
2.0%
science 646
 
1.4%
comedy,drama,romance 593
 
1.3%
Other values (3740) 28131
60.1%

Most occurring characters

ValueCountFrequency (%)
r 69070
 
10.5%
a 61813
 
9.4%
e 55766
 
8.5%
m 53095
 
8.0%
o 48525
 
7.4%
, 48044
 
7.3%
i 39656
 
6.0%
n 35664
 
5.4%
y 28508
 
4.3%
c 27970
 
4.2%
Other values (20) 191623
29.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 512272
77.6%
Uppercase Letter 95610
 
14.5%
Other Punctuation 48044
 
7.3%
Space Separator 3808
 
0.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 69070
13.5%
a 61813
12.1%
e 55766
10.9%
m 53095
10.4%
o 48525
9.5%
i 39656
7.7%
n 35664
7.0%
y 28508
5.6%
c 27970
5.5%
t 26197
 
5.1%
Other values (7) 66008
12.9%
Uppercase Letter
ValueCountFrequency (%)
D 24176
25.3%
C 17486
18.3%
A 12018
12.6%
F 9744
10.2%
T 8385
 
8.8%
R 6733
 
7.0%
H 6067
 
6.3%
M 4828
 
5.0%
S 3042
 
3.2%
W 2365
 
2.5%
Other Punctuation
ValueCountFrequency (%)
, 48044
100.0%
Space Separator
ValueCountFrequency (%)
3808
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 607882
92.1%
Common 51852
 
7.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 69070
11.4%
a 61813
 
10.2%
e 55766
 
9.2%
m 53095
 
8.7%
o 48525
 
8.0%
i 39656
 
6.5%
n 35664
 
5.9%
y 28508
 
4.7%
c 27970
 
4.6%
t 26197
 
4.3%
Other values (18) 161618
26.6%
Common
ValueCountFrequency (%)
, 48044
92.7%
3808
 
7.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 659734
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r 69070
 
10.5%
a 61813
 
9.4%
e 55766
 
8.5%
m 53095
 
8.0%
o 48525
 
7.4%
, 48044
 
7.3%
i 39656
 
6.0%
n 35664
 
5.4%
y 28508
 
4.3%
c 27970
 
4.2%
Other values (20) 191623
29.0%

id_production
Categorical

HIGH CARDINALITY  MISSING 

Distinct22702
Distinct (%)67.6%
Missing11796
Missing (%)26.0%
Memory size354.6 KiB
8411.0
 
742
6194.0
 
540
4.0
 
505
306.0
 
439
33.0
 
320
Other values (22697)
31034 

Length

Max length198
Median length180
Mean length13.976534
Min length3

Characters and Unicode

Total characters469332
Distinct characters12
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique20341 ?
Unique (%)60.6%

Sample

1st row3.0
2nd row559.0,2550.0,10201.0
3rd row6194.0,19464.0
4th row306.0
5th row5842.0,9195.0

Common Values

ValueCountFrequency (%)
8411.0 742
 
1.6%
6194.0 540
 
1.2%
4.0 505
 
1.1%
306.0 439
 
1.0%
33.0 320
 
0.7%
6.0 247
 
0.5%
441.0 207
 
0.5%
5.0 146
 
0.3%
5120.0 145
 
0.3%
2.0 85
 
0.2%
Other values (22692) 30204
66.6%
(Missing) 11796
 
26.0%

Length

2023-05-13T16:21:14.995957image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
8411.0 742
 
2.2%
6194.0 540
 
1.6%
4.0 505
 
1.5%
306.0 439
 
1.3%
33.0 320
 
1.0%
6.0 247
 
0.7%
441.0 207
 
0.6%
5.0 146
 
0.4%
5120.0 145
 
0.4%
2.0 85
 
0.3%
Other values (22692) 30204
89.9%

Most occurring characters

ValueCountFrequency (%)
0 93776
20.0%
. 70530
15.0%
1 44353
9.5%
, 36950
 
7.9%
2 32511
 
6.9%
3 31287
 
6.7%
4 30198
 
6.4%
6 27897
 
5.9%
5 27632
 
5.9%
8 25668
 
5.5%
Other values (2) 48530
10.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 361852
77.1%
Other Punctuation 107480
 
22.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 93776
25.9%
1 44353
12.3%
2 32511
 
9.0%
3 31287
 
8.6%
4 30198
 
8.3%
6 27897
 
7.7%
5 27632
 
7.6%
8 25668
 
7.1%
7 24378
 
6.7%
9 24152
 
6.7%
Other Punctuation
ValueCountFrequency (%)
. 70530
65.6%
, 36950
34.4%

Most occurring scripts

ValueCountFrequency (%)
Common 469332
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 93776
20.0%
. 70530
15.0%
1 44353
9.5%
, 36950
 
7.9%
2 32511
 
6.9%
3 31287
 
6.7%
4 30198
 
6.4%
6 27897
 
5.9%
5 27632
 
5.9%
8 25668
 
5.5%
Other values (2) 48530
10.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 469332
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 93776
20.0%
. 70530
15.0%
1 44353
9.5%
, 36950
 
7.9%
2 32511
 
6.9%
3 31287
 
6.7%
4 30198
 
6.4%
6 27897
 
5.9%
5 27632
 
5.9%
8 25668
 
5.5%
Other values (2) 48530
10.3%

name_production
Categorical

HIGH CARDINALITY  MISSING 

Distinct22667
Distinct (%)67.5%
Missing11796
Missing (%)26.0%
Memory size354.6 KiB
Metro-Goldwyn-Mayer (MGM)
 
742
Warner Bros.
 
540
Paramount Pictures
 
505
Twentieth Century Fox Film Corporation
 
439
Universal Pictures
 
320
Other values (22662)
31034 

Length

Max length584
Median length392
Mean length40.393895
Min length2

Characters and Unicode

Total characters1356427
Distinct characters294
Distinct categories17 ?
Distinct scripts6 ?
Distinct blocks6 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique20300 ?
Unique (%)60.5%

Sample

1st rowPixar Animation Studios
2nd rowTriStar Pictures,Teitler Film,Interscope Communications
3rd rowWarner Bros.,Lancaster Gate
4th rowTwentieth Century Fox Film Corporation
5th rowSandollar Productions,Touchstone Pictures

Common Values

ValueCountFrequency (%)
Metro-Goldwyn-Mayer (MGM) 742
 
1.6%
Warner Bros. 540
 
1.2%
Paramount Pictures 505
 
1.1%
Twentieth Century Fox Film Corporation 439
 
1.0%
Universal Pictures 320
 
0.7%
RKO Radio Pictures 247
 
0.5%
Columbia Pictures Corporation 207
 
0.5%
Columbia Pictures 146
 
0.3%
Mosfilm 145
 
0.3%
Walt Disney Pictures 85
 
0.2%
Other values (22657) 30204
66.6%
(Missing) 11796
 
26.0%

Length

2023-05-13T16:21:15.413957image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
productions 5415
 
3.8%
film 5178
 
3.7%
pictures 4968
 
3.5%
films 4878
 
3.5%
entertainment 2576
 
1.8%
corporation 1396
 
1.0%
company 1138
 
0.8%
fox 1129
 
0.8%
de 1057
 
0.8%
paramount 1036
 
0.7%
Other values (35959) 111962
79.6%

Most occurring characters

ValueCountFrequency (%)
107160
 
7.9%
i 106938
 
7.9%
e 94644
 
7.0%
n 89969
 
6.6%
o 85292
 
6.3%
r 83547
 
6.2%
t 83433
 
6.2%
a 77143
 
5.7%
s 62667
 
4.6%
l 51264
 
3.8%
Other values (284) 514370
37.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 987018
72.8%
Uppercase Letter 198965
 
14.7%
Space Separator 107165
 
7.9%
Other Punctuation 45099
 
3.3%
Decimal Number 4347
 
0.3%
Dash Punctuation 4331
 
0.3%
Open Punctuation 4328
 
0.3%
Close Punctuation 4327
 
0.3%
Math Symbol 662
 
< 0.1%
Other Letter 140
 
< 0.1%
Other values (7) 45
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 106938
10.8%
e 94644
9.6%
n 89969
9.1%
o 85292
8.6%
r 83547
8.5%
t 83433
8.5%
a 77143
 
7.8%
s 62667
 
6.3%
l 51264
 
5.2%
m 44275
 
4.5%
Other values (102) 207846
21.1%
Other Letter
ValueCountFrequency (%)
9
 
6.4%
8
 
5.7%
6
 
4.3%
5
 
3.6%
5
 
3.6%
5
 
3.6%
5
 
3.6%
5
 
3.6%
4
 
2.9%
3
 
2.1%
Other values (62) 85
60.7%
Uppercase Letter
ValueCountFrequency (%)
P 27880
14.0%
F 26362
13.2%
C 20585
 
10.3%
M 13361
 
6.7%
S 11911
 
6.0%
E 9746
 
4.9%
A 9547
 
4.8%
T 9356
 
4.7%
B 9001
 
4.5%
G 7811
 
3.9%
Other values (52) 53405
26.8%
Other Punctuation
ValueCountFrequency (%)
, 37354
82.8%
. 5671
 
12.6%
& 764
 
1.7%
/ 645
 
1.4%
' 451
 
1.0%
" 133
 
0.3%
! 36
 
0.1%
% 18
 
< 0.1%
: 9
 
< 0.1%
@ 5
 
< 0.1%
Other values (6) 13
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
2 1034
23.8%
1 712
16.4%
0 641
14.7%
3 556
12.8%
4 481
11.1%
9 205
 
4.7%
6 195
 
4.5%
5 178
 
4.1%
8 173
 
4.0%
7 172
 
4.0%
Open Punctuation
ValueCountFrequency (%)
( 4318
99.8%
[ 9
 
0.2%
1
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
) 4317
99.8%
] 9
 
0.2%
1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
107160
> 99.9%
  5
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
- 4329
> 99.9%
2
 
< 0.1%
Math Symbol
ValueCountFrequency (%)
+ 661
99.8%
| 1
 
0.2%
Other Symbol
ValueCountFrequency (%)
° 23
92.0%
2
 
8.0%
Final Punctuation
ValueCountFrequency (%)
3
50.0%
» 3
50.0%
Other Number
ValueCountFrequency (%)
² 1
50.0%
½ 1
50.0%
Connector Punctuation
ValueCountFrequency (%)
_ 4
100.0%
Control
ValueCountFrequency (%)
4
100.0%
Initial Punctuation
ValueCountFrequency (%)
« 3
100.0%
Format
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1185580
87.4%
Common 170302
 
12.6%
Cyrillic 373
 
< 0.1%
Hangul 115
 
< 0.1%
Greek 31
 
< 0.1%
Han 26
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 106938
 
9.0%
e 94644
 
8.0%
n 89969
 
7.6%
o 85292
 
7.2%
r 83547
 
7.0%
t 83433
 
7.0%
a 77143
 
6.5%
s 62667
 
5.3%
l 51264
 
4.3%
m 44275
 
3.7%
Other values (99) 406408
34.3%
Hangul
ValueCountFrequency (%)
9
 
7.8%
8
 
7.0%
6
 
5.2%
5
 
4.3%
5
 
4.3%
5
 
4.3%
5
 
4.3%
5
 
4.3%
4
 
3.5%
3
 
2.6%
Other values (43) 60
52.2%
Common
ValueCountFrequency (%)
107160
62.9%
, 37354
 
21.9%
. 5671
 
3.3%
- 4329
 
2.5%
( 4318
 
2.5%
) 4317
 
2.5%
2 1034
 
0.6%
& 764
 
0.4%
1 712
 
0.4%
+ 661
 
0.4%
Other values (37) 3982
 
2.3%
Cyrillic
ValueCountFrequency (%)
и 34
 
9.1%
о 28
 
7.5%
а 26
 
7.0%
л 22
 
5.9%
н 20
 
5.4%
м 19
 
5.1%
т 17
 
4.6%
е 16
 
4.3%
с 16
 
4.3%
ь 16
 
4.3%
Other values (36) 159
42.6%
Greek
ValueCountFrequency (%)
ν 3
 
9.7%
ο 3
 
9.7%
Κ 2
 
6.5%
Ε 2
 
6.5%
λ 2
 
6.5%
τ 2
 
6.5%
η 2
 
6.5%
ρ 2
 
6.5%
ι 2
 
6.5%
έ 1
 
3.2%
Other values (10) 10
32.3%
Han
ValueCountFrequency (%)
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
1
 
3.8%
1
 
3.8%
1
 
3.8%
Other values (9) 9
34.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1350197
99.5%
None 5711
 
0.4%
Cyrillic 373
 
< 0.1%
Hangul 113
 
< 0.1%
CJK 26
 
< 0.1%
Punctuation 7
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
107160
 
7.9%
i 106938
 
7.9%
e 94644
 
7.0%
n 89969
 
6.7%
o 85292
 
6.3%
r 83547
 
6.2%
t 83433
 
6.2%
a 77143
 
5.7%
s 62667
 
4.6%
l 51264
 
3.8%
Other values (77) 508140
37.6%
None
ValueCountFrequency (%)
é 3176
55.6%
ó 416
 
7.3%
á 317
 
5.6%
í 173
 
3.0%
ü 154
 
2.7%
ñ 150
 
2.6%
ô 140
 
2.5%
ä 137
 
2.4%
è 136
 
2.4%
ö 132
 
2.3%
Other values (76) 780
 
13.7%
Cyrillic
ValueCountFrequency (%)
и 34
 
9.1%
о 28
 
7.5%
а 26
 
7.0%
л 22
 
5.9%
н 20
 
5.4%
м 19
 
5.1%
т 17
 
4.6%
е 16
 
4.3%
с 16
 
4.3%
ь 16
 
4.3%
Other values (36) 159
42.6%
Hangul
ValueCountFrequency (%)
9
 
8.0%
8
 
7.1%
6
 
5.3%
5
 
4.4%
5
 
4.4%
5
 
4.4%
5
 
4.4%
5
 
4.4%
4
 
3.5%
3
 
2.7%
Other values (42) 58
51.3%
Punctuation
ValueCountFrequency (%)
3
42.9%
2
28.6%
1
 
14.3%
1
 
14.3%
CJK
ValueCountFrequency (%)
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
1
 
3.8%
1
 
3.8%
1
 
3.8%
Other values (9) 9
34.6%

id_countrie
Categorical

HIGH CARDINALITY  IMBALANCE  MISSING 

Distinct2388
Distinct (%)6.1%
Missing6211
Missing (%)13.7%
Memory size354.6 KiB
US
17846 
GB
2235 
FR
 
1653
JP
 
1356
IT
 
1029
Other values (2383)
15046 

Length

Max length74
Median length2
Mean length2.7846036
Min length2

Characters and Unicode

Total characters109059
Distinct characters27
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1764 ?
Unique (%)4.5%

Sample

1st rowUS
2nd rowUS
3rd rowUS
4th rowUS
5th rowUS

Common Values

ValueCountFrequency (%)
US 17846
39.3%
GB 2235
 
4.9%
FR 1653
 
3.6%
JP 1356
 
3.0%
IT 1029
 
2.3%
CA 840
 
1.9%
DE 749
 
1.7%
IN 735
 
1.6%
RU 734
 
1.6%
GB,US 569
 
1.3%
Other values (2378) 11419
25.2%
(Missing) 6211
 
13.7%

Length

2023-05-13T16:21:16.060052image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
us 17846
45.6%
gb 2235
 
5.7%
fr 1653
 
4.2%
jp 1356
 
3.5%
it 1029
 
2.6%
ca 840
 
2.1%
de 749
 
1.9%
in 735
 
1.9%
ru 734
 
1.9%
gb,us 569
 
1.5%
Other values (2378) 11419
29.2%

Most occurring characters

ValueCountFrequency (%)
S 23041
21.1%
U 23024
21.1%
, 10243
9.4%
R 6686
 
6.1%
B 4982
 
4.6%
E 4752
 
4.4%
G 4448
 
4.1%
F 4342
 
4.0%
I 4010
 
3.7%
A 3136
 
2.9%
Other values (17) 20395
18.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 98816
90.6%
Other Punctuation 10243
 
9.4%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 23041
23.3%
U 23024
23.3%
R 6686
 
6.8%
B 4982
 
5.0%
E 4752
 
4.8%
G 4448
 
4.5%
F 4342
 
4.4%
I 4010
 
4.1%
A 3136
 
3.2%
T 3007
 
3.0%
Other values (16) 17388
17.6%
Other Punctuation
ValueCountFrequency (%)
, 10243
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 98816
90.6%
Common 10243
 
9.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 23041
23.3%
U 23024
23.3%
R 6686
 
6.8%
B 4982
 
5.0%
E 4752
 
4.8%
G 4448
 
4.5%
F 4342
 
4.4%
I 4010
 
4.1%
A 3136
 
3.2%
T 3007
 
3.0%
Other values (16) 17388
17.6%
Common
ValueCountFrequency (%)
, 10243
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 109059
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S 23041
21.1%
U 23024
21.1%
, 10243
9.4%
R 6686
 
6.1%
B 4982
 
4.6%
E 4752
 
4.4%
G 4448
 
4.1%
F 4342
 
4.0%
I 4010
 
3.7%
A 3136
 
2.9%
Other values (17) 20395
18.7%

name_countrie
Categorical

HIGH CARDINALITY  IMBALANCE  MISSING 

Distinct2388
Distinct (%)6.1%
Missing6211
Missing (%)13.7%
Memory size354.6 KiB
United States of America
17846 
United Kingdom
2235 
France
 
1653
Japan
 
1356
Italy
 
1029
Other values (2383)
15046 

Length

Max length213
Median length153
Mean length18.785267
Min length4

Characters and Unicode

Total characters735725
Distinct characters53
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1764 ?
Unique (%)4.5%

Sample

1st rowUnited States of America
2nd rowUnited States of America
3rd rowUnited States of America
4th rowUnited States of America
5th rowUnited States of America

Common Values

ValueCountFrequency (%)
United States of America 17846
39.3%
United Kingdom 2235
 
4.9%
France 1653
 
3.6%
Japan 1356
 
3.0%
Italy 1029
 
2.3%
Canada 840
 
1.9%
Germany 749
 
1.7%
India 735
 
1.6%
Russia 734
 
1.6%
United Kingdom,United States of America 569
 
1.3%
Other values (2378) 11419
25.2%
(Missing) 6211
 
13.7%

Length

2023-05-13T16:21:16.633395image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
united 21501
19.8%
states 21148
19.5%
of 21147
19.5%
america 20329
18.7%
kingdom 2858
 
2.6%
france 1653
 
1.5%
japan 1356
 
1.2%
italy 1029
 
0.9%
kingdom,united 933
 
0.9%
canada 840
 
0.8%
Other values (2132) 15717
14.5%

Most occurring characters

ValueCountFrequency (%)
e 80649
 
11.0%
t 72619
 
9.9%
a 70488
 
9.6%
69346
 
9.4%
i 58548
 
8.0%
n 47495
 
6.5%
d 34545
 
4.7%
r 32490
 
4.4%
o 29580
 
4.0%
m 28704
 
3.9%
Other values (43) 211261
28.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 558562
75.9%
Uppercase Letter 97569
 
13.3%
Space Separator 69346
 
9.4%
Other Punctuation 10248
 
1.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 80649
14.4%
t 72619
13.0%
a 70488
12.6%
i 58548
10.5%
n 47495
8.5%
d 34545
6.2%
r 32490
5.8%
o 29580
 
5.3%
m 28704
 
5.1%
c 26371
 
4.7%
Other values (16) 77073
13.8%
Uppercase Letter
ValueCountFrequency (%)
U 25367
26.0%
S 23836
24.4%
A 22389
22.9%
K 5218
 
5.3%
F 4334
 
4.4%
I 3585
 
3.7%
C 2594
 
2.7%
G 2473
 
2.5%
J 1664
 
1.7%
R 1307
 
1.3%
Other values (14) 4802
 
4.9%
Other Punctuation
ValueCountFrequency (%)
, 10243
> 99.9%
' 5
 
< 0.1%
Space Separator
ValueCountFrequency (%)
69346
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 656131
89.2%
Common 79594
 
10.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 80649
12.3%
t 72619
11.1%
a 70488
10.7%
i 58548
 
8.9%
n 47495
 
7.2%
d 34545
 
5.3%
r 32490
 
5.0%
o 29580
 
4.5%
m 28704
 
4.4%
c 26371
 
4.0%
Other values (40) 174642
26.6%
Common
ValueCountFrequency (%)
69346
87.1%
, 10243
 
12.9%
' 5
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 735725
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 80649
 
11.0%
t 72619
 
9.9%
a 70488
 
9.6%
69346
 
9.4%
i 58548
 
8.0%
n 47495
 
6.5%
d 34545
 
4.7%
r 32490
 
4.4%
o 29580
 
4.0%
m 28704
 
3.9%
Other values (43) 211261
28.7%

id_language
Categorical

HIGH CARDINALITY  IMBALANCE  MISSING 

Distinct1930
Distinct (%)4.6%
Missing3768
Missing (%)8.3%
Memory size354.6 KiB
en
22380 
fr
 
1852
ja
 
1289
it
 
1217
es
 
901
Other values (1925)
13969 

Length

Max length56
Median length2
Mean length2.8411363
Min length2

Characters and Unicode

Total characters118214
Distinct characters27
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1366 ?
Unique (%)3.3%

Sample

1st rowen
2nd rowen,fr
3rd rowen
4th rowen
5th rowen

Common Values

ValueCountFrequency (%)
en 22380
49.3%
fr 1852
 
4.1%
ja 1289
 
2.8%
it 1217
 
2.7%
es 901
 
2.0%
ru 807
 
1.8%
de 761
 
1.7%
en,fr 681
 
1.5%
en,es 572
 
1.3%
hi 481
 
1.1%
Other values (1920) 10667
23.5%
(Missing) 3768
 
8.3%

Length

2023-05-13T16:21:16.831947image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
en 22380
53.8%
fr 1852
 
4.5%
ja 1289
 
3.1%
it 1217
 
2.9%
es 901
 
2.2%
ru 807
 
1.9%
de 761
 
1.8%
en,fr 681
 
1.6%
en,es 572
 
1.4%
hi 481
 
1.2%
Other values (1920) 10667
25.6%

Most occurring characters

ValueCountFrequency (%)
e 34359
29.1%
n 29806
25.2%
, 11666
 
9.9%
r 6736
 
5.7%
f 4740
 
4.0%
t 3713
 
3.1%
i 3690
 
3.1%
s 3630
 
3.1%
d 2988
 
2.5%
a 2951
 
2.5%
Other values (17) 13935
11.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 106548
90.1%
Other Punctuation 11666
 
9.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 34359
32.2%
n 29806
28.0%
r 6736
 
6.3%
f 4740
 
4.4%
t 3713
 
3.5%
i 3690
 
3.5%
s 3630
 
3.4%
d 2988
 
2.8%
a 2951
 
2.8%
h 2354
 
2.2%
Other values (16) 11581
 
10.9%
Other Punctuation
ValueCountFrequency (%)
, 11666
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 106548
90.1%
Common 11666
 
9.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 34359
32.2%
n 29806
28.0%
r 6736
 
6.3%
f 4740
 
4.4%
t 3713
 
3.5%
i 3690
 
3.5%
s 3630
 
3.4%
d 2988
 
2.8%
a 2951
 
2.8%
h 2354
 
2.2%
Other values (16) 11581
 
10.9%
Common
ValueCountFrequency (%)
, 11666
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 118214
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 34359
29.1%
n 29806
25.2%
, 11666
 
9.9%
r 6736
 
5.7%
f 4740
 
4.0%
t 3713
 
3.1%
i 3690
 
3.1%
s 3630
 
3.1%
d 2988
 
2.5%
a 2951
 
2.5%
Other values (17) 13935
11.8%

name_language
Categorical

HIGH CARDINALITY  IMBALANCE  MISSING 

Distinct1841
Distinct (%)4.4%
Missing3891
Missing (%)8.6%
Memory size354.6 KiB
English
22380 
Français
 
1852
日本語
 
1289
Italiano
 
1217
Español
 
901
Other values (1836)
13846 

Length

Max length153
Median length7
Mean length9.1166446
Min length1

Characters and Unicode

Total characters378204
Distinct characters171
Distinct categories8 ?
Distinct scripts15 ?
Distinct blocks16 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1293 ?
Unique (%)3.1%

Sample

1st rowEnglish
2nd rowEnglish,Français
3rd rowEnglish
4th rowEnglish
5th rowEnglish

Common Values

ValueCountFrequency (%)
English 22380
49.3%
Français 1852
 
4.1%
日本語 1289
 
2.8%
Italiano 1217
 
2.7%
Español 901
 
2.0%
Pусский 807
 
1.8%
Deutsch 761
 
1.7%
English,Français 681
 
1.5%
English,Español 572
 
1.3%
हिन्दी 481
 
1.1%
Other values (1831) 10544
23.2%
(Missing) 3891
 
8.6%

Length

2023-05-13T16:21:17.141105image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
english 22461
52.4%
français 1859
 
4.3%
日本語 1290
 
3.0%
italiano 1219
 
2.8%
español 912
 
2.1%
pусский 813
 
1.9%
deutsch 765
 
1.8%
english,français 689
 
1.6%
english,español 576
 
1.3%
हिन्दी 489
 
1.1%
Other values (1730) 11825
27.6%

Most occurring characters

ValueCountFrequency (%)
s 42270
11.2%
n 37462
 
9.9%
i 37109
 
9.8%
l 34631
 
9.2%
h 31459
 
8.3%
E 31198
 
8.2%
g 30413
 
8.0%
a 18946
 
5.0%
, 11666
 
3.1%
o 7053
 
1.9%
Other values (161) 95997
25.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 292028
77.2%
Uppercase Letter 46428
 
12.3%
Other Letter 22191
 
5.9%
Other Punctuation 12731
 
3.4%
Spacing Mark 1838
 
0.5%
Nonspacing Mark 1549
 
0.4%
Space Separator 1413
 
0.4%
Control 26
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 42270
14.5%
n 37462
12.8%
i 37109
12.7%
l 34631
11.9%
h 31459
10.8%
g 30413
10.4%
a 18946
6.5%
o 7053
 
2.4%
r 6128
 
2.1%
t 5977
 
2.0%
Other values (63) 40580
13.9%
Other Letter
ValueCountFrequency (%)
1758
 
7.9%
1758
 
7.9%
1758
 
7.9%
1263
 
5.7%
946
 
4.3%
790
 
3.6%
790
 
3.6%
707
 
3.2%
707
 
3.2%
707
 
3.2%
Other values (46) 11007
49.6%
Uppercase Letter
ValueCountFrequency (%)
E 31198
67.2%
F 4196
 
9.0%
D 2926
 
6.3%
P 2677
 
5.8%
I 2366
 
5.1%
N 829
 
1.8%
L 505
 
1.1%
M 362
 
0.8%
T 308
 
0.7%
Č 284
 
0.6%
Other values (13) 777
 
1.7%
Spacing Mark
ValueCountFrequency (%)
ि 707
38.5%
707
38.5%
136
 
7.4%
ி 111
 
6.0%
94
 
5.1%
47
 
2.6%
18
 
1.0%
18
 
1.0%
Nonspacing Mark
ValueCountFrequency (%)
707
45.6%
ִ 430
27.8%
ְ 215
 
13.9%
111
 
7.2%
68
 
4.4%
18
 
1.2%
Other Punctuation
ValueCountFrequency (%)
, 11666
91.6%
/ 1015
 
8.0%
? 50
 
0.4%
Space Separator
ValueCountFrequency (%)
1413
100.0%
Control
ValueCountFrequency (%)
š 26
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 326067
86.2%
Common 14170
 
3.7%
Han 10482
 
2.8%
Cyrillic 10454
 
2.8%
Devanagari 4242
 
1.1%
Arabic 3344
 
0.9%
Hangul 3252
 
0.9%
Hebrew 1720
 
0.5%
Greek 1704
 
0.5%
Thai 1232
 
0.3%
Other values (5) 1537
 
0.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 42270
13.0%
n 37462
11.5%
i 37109
11.4%
l 34631
10.6%
h 31459
9.6%
E 31198
9.6%
g 30413
9.3%
a 18946
 
5.8%
o 7053
 
2.2%
r 6128
 
1.9%
Other values (50) 49398
15.1%
Cyrillic
ValueCountFrequency (%)
с 3211
30.7%
к 1734
16.6%
и 1679
16.1%
й 1615
15.4%
у 1564
15.0%
а 113
 
1.1%
р 87
 
0.8%
У 53
 
0.5%
ї 53
 
0.5%
н 53
 
0.5%
Other values (12) 292
 
2.8%
Arabic
ValueCountFrequency (%)
ا 537
16.1%
ر 537
16.1%
ع 341
10.2%
ب 341
10.2%
ي 341
10.2%
ة 341
10.2%
ل 341
10.2%
ی 141
 
4.2%
ف 141
 
4.2%
س 141
 
4.2%
Other values (5) 142
 
4.2%
Han
ValueCountFrequency (%)
1758
16.8%
1758
16.8%
1758
16.8%
1263
12.0%
946
9.0%
790
7.5%
790
7.5%
473
 
4.5%
473
 
4.5%
广 473
 
4.5%
Hebrew
ValueCountFrequency (%)
ִ 430
25.0%
ת 215
12.5%
י 215
12.5%
ר 215
12.5%
ְ 215
12.5%
ב 215
12.5%
ע 215
12.5%
Greek
ValueCountFrequency (%)
λ 426
25.0%
ά 213
12.5%
κ 213
12.5%
ι 213
12.5%
ν 213
12.5%
ε 213
12.5%
η 213
12.5%
Georgian
ValueCountFrequency (%)
33
14.3%
33
14.3%
33
14.3%
33
14.3%
33
14.3%
33
14.3%
33
14.3%
Devanagari
ValueCountFrequency (%)
ि 707
16.7%
707
16.7%
707
16.7%
707
16.7%
707
16.7%
707
16.7%
Hangul
ValueCountFrequency (%)
542
16.7%
542
16.7%
542
16.7%
542
16.7%
542
16.7%
542
16.7%
Thai
ValueCountFrequency (%)
352
28.6%
176
14.3%
176
14.3%
176
14.3%
176
14.3%
176
14.3%
Gurmukhi
ValueCountFrequency (%)
18
16.7%
18
16.7%
18
16.7%
18
16.7%
18
16.7%
18
16.7%
Common
ValueCountFrequency (%)
, 11666
82.3%
1413
 
10.0%
/ 1015
 
7.2%
? 50
 
0.4%
š 26
 
0.2%
Telugu
ValueCountFrequency (%)
136
33.3%
68
16.7%
68
16.7%
68
16.7%
68
16.7%
Tamil
ValueCountFrequency (%)
111
20.0%
ி 111
20.0%
111
20.0%
111
20.0%
111
20.0%
Bengali
ValueCountFrequency (%)
94
40.0%
47
20.0%
47
20.0%
47
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 331381
87.6%
CJK 10482
 
2.8%
Cyrillic 10454
 
2.8%
None 10434
 
2.8%
Devanagari 4242
 
1.1%
Arabic 3344
 
0.9%
Hangul 3252
 
0.9%
Hebrew 1720
 
0.5%
Thai 1232
 
0.3%
Tamil 555
 
0.1%
Other values (6) 1108
 
0.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 42270
12.8%
n 37462
11.3%
i 37109
11.2%
l 34631
10.5%
h 31459
9.5%
E 31198
9.4%
g 30413
9.2%
a 18946
 
5.7%
, 11666
 
3.5%
o 7053
 
2.1%
Other values (38) 49174
14.8%
None
ValueCountFrequency (%)
ç 4441
42.6%
ñ 2412
23.1%
ê 591
 
5.7%
λ 426
 
4.1%
ý 284
 
2.7%
Č 284
 
2.7%
ü 247
 
2.4%
ά 213
 
2.0%
κ 213
 
2.0%
ι 213
 
2.0%
Other values (11) 1110
 
10.6%
Cyrillic
ValueCountFrequency (%)
с 3211
30.7%
к 1734
16.6%
и 1679
16.1%
й 1615
15.4%
у 1564
15.0%
а 113
 
1.1%
р 87
 
0.8%
У 53
 
0.5%
ї 53
 
0.5%
н 53
 
0.5%
Other values (12) 292
 
2.8%
CJK
ValueCountFrequency (%)
1758
16.8%
1758
16.8%
1758
16.8%
1263
12.0%
946
9.0%
790
7.5%
790
7.5%
473
 
4.5%
473
 
4.5%
广 473
 
4.5%
Devanagari
ValueCountFrequency (%)
ि 707
16.7%
707
16.7%
707
16.7%
707
16.7%
707
16.7%
707
16.7%
Hangul
ValueCountFrequency (%)
542
16.7%
542
16.7%
542
16.7%
542
16.7%
542
16.7%
542
16.7%
Arabic
ValueCountFrequency (%)
ا 537
16.1%
ر 537
16.1%
ع 341
10.2%
ب 341
10.2%
ي 341
10.2%
ة 341
10.2%
ل 341
10.2%
ی 141
 
4.2%
ف 141
 
4.2%
س 141
 
4.2%
Other values (5) 142
 
4.2%
Hebrew
ValueCountFrequency (%)
ִ 430
25.0%
ת 215
12.5%
י 215
12.5%
ר 215
12.5%
ְ 215
12.5%
ב 215
12.5%
ע 215
12.5%
Thai
ValueCountFrequency (%)
352
28.6%
176
14.3%
176
14.3%
176
14.3%
176
14.3%
176
14.3%
Telugu
ValueCountFrequency (%)
136
33.3%
68
16.7%
68
16.7%
68
16.7%
68
16.7%
Tamil
ValueCountFrequency (%)
111
20.0%
ி 111
20.0%
111
20.0%
111
20.0%
111
20.0%
Bengali
ValueCountFrequency (%)
94
40.0%
47
20.0%
47
20.0%
47
20.0%
Latin Ext Additional
ValueCountFrequency (%)
ế 61
50.0%
61
50.0%
Georgian
ValueCountFrequency (%)
33
14.3%
33
14.3%
33
14.3%
33
14.3%
33
14.3%
33
14.3%
33
14.3%
Gurmukhi
ValueCountFrequency (%)
18
16.7%
18
16.7%
18
16.7%
18
16.7%
18
16.7%
18
16.7%
IPA Ext
ValueCountFrequency (%)
ə 4
100.0%

Interactions

2023-05-13T16:21:00.114377image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:37.123302image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:39.623743image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:41.608505image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:44.901725image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:47.033282image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:49.247099image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:52.228714image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:54.809649image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:57.505260image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:21:00.318361image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:37.298521image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:39.824645image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:41.788417image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:45.071605image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:47.255493image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:49.565983image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:52.433334image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:55.158068image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:57.731257image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:21:00.536374image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:37.498791image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:40.028388image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:42.068446image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:45.283391image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:47.461497image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:49.933045image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:52.664336image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:55.579817image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:58.014057image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:21:00.730428image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:37.678396image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:40.218662image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:42.432622image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:45.467683image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:47.659575image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:50.199426image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:52.874585image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:55.810735image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:58.294410image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:21:00.931431image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:37.868559image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:40.418853image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:42.680534image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:45.650856image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:47.951728image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:50.445042image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:53.091214image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:56.051374image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:58.616930image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:21:01.133666image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:38.100730image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:40.608531image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:42.962094image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:45.844063image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:48.125535image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:50.698132image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:53.504039image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:56.282714image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:58.875930image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:21:01.330665image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:38.461060image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:40.808542image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:43.341814image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:46.134555image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:48.297379image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:50.965822image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:53.726663image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:56.522329image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:59.116802image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:21:01.530748image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:38.791411image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:41.008552image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:43.664440image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:46.362442image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:48.470612image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:51.321829image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:54.057831image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:56.774614image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:59.346779image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:21:01.817192image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:39.182489image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:41.218728image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:44.166442image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:46.596519image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:48.761022image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:51.616887image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:54.304064image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:57.046268image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:59.630780image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:21:02.077713image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:39.408456image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:41.413397image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:44.715816image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:46.832135image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:49.016012image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:51.925903image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:54.573891image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:57.261259image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-05-13T16:20:59.900931image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2023-05-13T16:21:17.414175image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Unnamed: 0budgetidpopularityrevenueruntimevote_averagerelease_yearreturnid_collectionoriginal_languagestatus
Unnamed: 01.000-0.2560.619-0.392-0.305-0.235-0.1550.336-0.2700.3050.1120.026
budget-0.2561.000-0.2550.4630.6440.2270.0720.1410.775-0.2990.0000.000
id0.619-0.2551.000-0.410-0.278-0.205-0.1490.392-0.2620.4280.0710.056
popularity-0.3920.463-0.4101.0000.4910.3070.2410.1860.447-0.3470.0000.000
revenue-0.3050.644-0.2780.4911.0000.2540.1270.1040.853-0.3260.0000.000
runtime-0.2350.227-0.2050.3070.2541.0000.1930.0340.234-0.1350.1110.000
vote_average-0.1550.072-0.1490.2410.1270.1931.000-0.0090.120-0.0120.0700.019
release_year0.3360.1410.3920.1860.1040.034-0.0091.0000.0870.0400.1440.028
return-0.2700.775-0.2620.4470.8530.2340.1200.0871.000-0.3150.0000.000
id_collection0.305-0.2990.428-0.347-0.326-0.135-0.0120.040-0.3151.0000.1500.000
original_language0.1120.0000.0710.0000.0000.1110.0700.1440.0000.1501.0000.000
status0.0260.0000.0560.0000.0000.0000.0190.0280.0000.0000.0001.000

Missing values

2023-05-13T16:21:02.562079image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-05-13T16:21:03.759722image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-05-13T16:21:04.794370image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Unnamed: 0budgetidoriginal_languageoverviewpopularityrelease_daterevenueruntimestatustaglinetitlevote_averagerelease_yearreturnid_collectionname_collectionid_genresname_genresid_productionname_productionid_countriename_countrieid_languagename_language
0030000000.0862enLed by Woody, Andy's toys live happily in his room until Andy's birthday brings Buzz Lightyear onto the scene. Afraid of losing his place in Andy's heart, Woody plots against Buzz. But when circumstances separate Buzz and Woody from their owner, the duo eventually learns to put aside their differences.21.9469431995-10-30373554033.081.0ReleasedNaNToy Story7.7199512.45180110194.0Toy Story Collection16.0,35.0,10751.0Animation,Comedy,Family3.0Pixar Animation StudiosUSUnited States of AmericaenEnglish
1165000000.08844enWhen siblings Judy and Peter discover an enchanted board game that opens the door to a magical world, they unwittingly invite Alan -- an adult who's been trapped inside the game for 26 years -- into their living room. Alan's only hope for freedom is to finish the game, which proves risky as all three find themselves running from giant rhinoceroses, evil monkeys and other terrifying creatures.17.0155391995-12-15262797249.0104.0ReleasedRoll the dice and unleash the excitement!Jumanji6.919954.043035NaNNaN12.0,14.0,10751.0Adventure,Fantasy,Family559.0,2550.0,10201.0TriStar Pictures,Teitler Film,Interscope CommunicationsUSUnited States of Americaen,frEnglish,Français
220.015602enA family wedding reignites the ancient feud between next-door neighbors and fishing buddies John and Max. Meanwhile, a sultry Italian divorcée opens a restaurant at the local bait shop, alarming the locals who worry she'll scare the fish away. But she's less interested in seafood than she is in cooking up a hot time with Max.11.7129001995-12-220.0101.0ReleasedStill Yelling. Still Fighting. Still Ready for Love.Grumpier Old Men6.519950.000000119050.0Grumpy Old Men Collection10749.0,35.0Romance,Comedy6194.0,19464.0Warner Bros.,Lancaster GateUSUnited States of AmericaenEnglish
3316000000.031357enCheated on, mistreated and stepped on, the women are holding their breath, waiting for the elusive "good man" to break a string of less-than-stellar lovers. Friends and confidants Vannah, Bernie, Glo and Robin talk it all out, determined to find a better way to breathe.3.8594951995-12-2281452156.0127.0ReleasedFriends are the people who let you be yourself... and never let you forget it.Waiting to Exhale6.119955.090760NaNNaN35.0,18.0,10749.0Comedy,Drama,Romance306.0Twentieth Century Fox Film CorporationUSUnited States of AmericaenEnglish
440.011862enJust when George Banks has recovered from his daughter's wedding, he receives the news that she's pregnant ... and that George's wife, Nina, is expecting too. He was planning on selling their home, but that's a plan that -- like George -- will have to change with the arrival of both a grandchild and a kid of his own.8.3875191995-02-1076578911.0106.0ReleasedJust When His World Is Back To Normal... He's In For The Surprise Of His Life!Father of the Bride Part II5.719950.00000096871.0Father of the Bride Collection35.0Comedy5842.0,9195.0Sandollar Productions,Touchstone PicturesUSUnited States of AmericaenEnglish
5560000000.0949enObsessive master thief, Neil McCauley leads a top-notch crew on various insane heists throughout Los Angeles while a mentally unstable detective, Vincent Hanna pursues him without rest. Each man recognizes and respects the ability and the dedication of the other even though they are aware their cat-and-mouse game may end in violence.17.9249271995-12-15187436818.0170.0ReleasedA Los Angeles Crime SagaHeat7.719953.123947NaNNaN28.0,80.0,18.0,53.0Action,Crime,Drama,Thriller508.0,675.0,6194.0Regency Enterprises,Forward Pass,Warner Bros.USUnited States of Americaen,esEnglish,Español
6658000000.011860enAn ugly duckling having undergone a remarkable change, still harbors feelings for her crush: a carefree playboy, but not before his business-focused brother has something to say about it.6.6772771995-12-150.0127.0ReleasedYou are cordially invited to the most surprising merger of the year.Sabrina6.219950.000000NaNNaN35.0,10749.0Comedy,Romance4.0,258.0,932.0,5842.0,14941.0,55873.0,58079.0Paramount Pictures,Scott Rudin Productions,Mirage Enterprises,Sandollar Productions,Constellation Entertainment,Worldwide,Mont Blanc Entertainment GmbHDE,USGermany,United States of Americafr,enFrançais,English
770.045325enA mischievous young boy, Tom Sawyer, witnesses a murder by the deadly Injun Joe. Tom becomes friends with Huckleberry Finn, a boy with no future and no family. Tom has to choose between honoring a friendship or honoring an oath because the town alcoholic is accused of the murder. Tom and Huck go through several adventures trying to retrieve evidence.2.5611611995-12-220.097.0ReleasedThe Original Bad Boys.Tom and Huck5.419950.000000NaNNaN28.0,12.0,18.0,10751.0Action,Adventure,Drama,Family2.0Walt Disney PicturesUSUnited States of Americaen,deEnglish,Deutsch
8835000000.09091enInternational action superstar Jean Claude Van Damme teams with Powers Boothe in a Tension-packed, suspense thriller, set against the back-drop of a Stanley Cup game.Van Damme portrays a father whose daughter is suddenly taken during a championship hockey game. With the captors demanding a billion dollars by game's end, Van Damme frantically sets a plan in motion to rescue his daughter and abort an impending explosion before the final buzzer...5.2315801995-12-2264350171.0106.0ReleasedTerror goes into overtime.Sudden Death5.519951.838576NaNNaN28.0,12.0,53.0Action,Adventure,Thriller33.0,21437.0,23770.0Universal Pictures,Imperial Entertainment,Signature EntertainmentUSUnited States of AmericaenEnglish
9958000000.0710enJames Bond must unmask the mysterious head of the Janus Syndicate and prevent the leader from utilizing the GoldenEye weapons system to inflict devastating revenge on Britain.14.6860361995-11-16352194034.0130.0ReleasedNo limits. No fears. No substitutes.GoldenEye6.619956.072311645.0James Bond Collection12.0,28.0,53.0Adventure,Action,Thriller60.0,7576.0United Artists,Eon ProductionsGB,USUnited Kingdom,United States of Americaen,ru,esEnglish,Pусский,Español
Unnamed: 0budgetidoriginal_languageoverviewpopularityrelease_daterevenueruntimestatustaglinetitlevote_averagerelease_yearreturnid_collectionname_collectionid_genresname_genresid_productionname_productionid_countriename_countrieid_languagename_language
45366453660.067179itSentenced to life imprisonment for illegal activities, Italian International member Giulio Manieri holds on to his political ideals while struggling against madness in the loneliness of his prison cell.0.2250511972-01-010.090.0ReleasedNaNSt. Michael Had a Rooster6.019720.0NaNNaNNaNNaNNaNNaNNaNNaNitItaliano
45367453670.084419enAn unsuccessful sculptor saves a madman named "The Creeper" from drowning. Seeing an opportunity for revenge, he tricks the psycho into murdering his critics.0.2228141946-03-290.065.0ReleasedMeet...The CREEPER!House of Horrors6.319460.0NaNNaN27.0,9648.0,53.0Horror,Mystery,Thriller33.0Universal PicturesUSUnited States of AmericaenEnglish
45368453680.0390959enIn this true-crime documentary, we delve into the murder spree that was the inspiration for Joe Berlinger's "Book of Shadows: Blair Witch 2".0.0760612000-10-220.045.0ReleasedNaNShadow of the Blair Witch7.020000.0NaNNaN9648.0,27.0Mystery,HorrorNaNNaNNaNNaNenEnglish
45369453690.0289923enA film archivist revisits the story of Rustin Parr, a hermit thought to have murdered seven children while under the possession of the Blair Witch.0.3864502000-10-030.030.0ReleasedDo you know what happened 50 years before "The Blair Witch Project"?The Burkittsville 77.020000.0NaNNaN27.0Horror27570.0,27571.0Neptune Salad Entertainment,Pirie ProductionsUSUnited States of AmericaenEnglish
45370453700.0222848enIt's the year 3000 AD. The world's most dangerous women are banished to a remote asteroid 45 million light years from earth. Kira Murphy doesn't belong; wrongfully accused of a crime she did not commit, she's thrown in this interplanetary prison and left to her own defenses. But Kira's a fighter, and soon she finds herself in the middle of a female gang war; where everyone wants a piece of the action... and a piece of her! "Caged Heat 3000" takes the Women-in-Prison genre to a whole new level... and a whole new galaxy!0.6615581995-01-010.085.0ReleasedNaNCaged Heat 30003.519950.0NaNNaN878.0Science Fiction4688.0Concorde-New HorizonsUSUnited States of AmericaenEnglish
45371453710.030840enYet another version of the classic epic, with enough variation to make it interesting. The story is the same, but some of the characters are quite different from the usual, in particular Uma Thurman's very special maid Marian. The photography is also great, giving the story a somewhat darker tone.5.6837531991-05-130.0104.0ReleasedNaNRobin Hood5.719910.0NaNNaN18.0,28.0,10749.0Drama,Action,Romance7025.0,10163.0,16323.0,38978.0Westdeutscher Rundfunk (WDR),Working Title Films,20th Century Fox Television,CanWest Global CommunicationsCA,DE,GB,USCanada,Germany,United Kingdom,United States of AmericaenEnglish
45372453720.0111109tlAn artist struggles to finish his work while a storyline about a cult plays in his head.0.1782412011-11-170.0360.0ReleasedNaNCentury of Birthing9.020110.0NaNNaN18.0Drama19653.0Sine OliviaPHPhilippinestlNaN
45373453730.067758enWhen one of her hits goes wrong, a professional assassin ends up with a suitcase full of a million dollars belonging to a mob boss ...0.9030072003-08-010.090.0ReleasedA deadly game of wits.Betrayal3.820030.0NaNNaN28.0,18.0,53.0Action,Drama,Thriller6165.0American World PicturesUSUnited States of AmericaenEnglish
45374453740.0227506enIn a small town live two brothers, one a minister and the other one a hunchback painter of the chapel who lives with his wife. One dreadful and stormy night, a stranger knocks at the door asking for shelter. The stranger talks about all the good things of the earthly life the minister is missing because of his puritanical faith. The minister comes to accept the stranger's viewpoint but it is others who will pay the consequences because the minister will discover the human pleasures thanks to, ehem, his sister- in -law… The tormented minister and his cuckolded brother will die in a strange accident in the chapel and later an infant will be born from the minister's adulterous relationship.0.0035031917-10-210.087.0ReleasedNaNSatan Triumphant0.019170.0NaNNaNNaNNaN88753.0YermolievRURussiaNaNNaN
45375453750.0461257en50 years after decriminalisation of homosexuality in the UK, director Daisy Asquith mines the jewels of the BFI archive to take us into the relationships, desires, fears and expressions of gay men and women in the 20th century.0.1630152017-06-090.075.0ReleasedNaNQueerama0.020170.0NaNNaNNaNNaNNaNNaNGBUnited KingdomenEnglish